Jump to content

Recommended Posts

Posted

I'm looking for

"Error: Cannot locate database"

in the following HTML.

I'm able to obtain 12345678 by:

$lastInputObj = _IETagNameGetCollection($mainFrame, "INPUT", @extended - 1)

MsgBox(0, $lastInputObj.value, $lastInputObj.value)

But I just don't know how to retrieve the last <font color=red> </font>

Would anybody has any idea?

<tr><td ALIGN=CENTER VALIGN=CENTER ><font color=red>

Error: Cannot locate database

</font></td></tr></table>

<form action="http://www.google.com" method="post" name="iform" >

<INPUT TYPE=HIDDEN NAME=action VALUE=2>

<INPUT TYPE=HIDDEN NAME=src1 VALUE=12345678></form>

Posted

If you just want to know if the text "Error: Cannot locate database" exists in the html then just use StringInStr.

Posted

If you just want to know if the text "Error: Cannot locate database" exists in the html then just use StringInStr.

But I don't know how to refer to this dynamically generated HTML page...

Posted

You can use _IEBodyReadHTML or _IEBodyReadText.

If StringInStr(_IEBodyReadText($o_object),"Error: Cannot locate database") Then
;Do Stuff...
EndIf
  • 2 weeks later...
Posted

You can use _IEBodyReadHTML or _IEBodyReadText.

If StringInStr(_IEBodyReadText($o_object),"Error: Cannot locate database") Then
;Do Stuff...
EndIf
So if I wanna know the total pages that is 11 in the following text appearing somewhere in the HTML,

Page 1 of 11 </td>

how can I use StringInStr to extract "11"?

_IENavigate ($oIE, "www.google.com")

If StringInStr(_IEBodyReadText($oIE),"Page") Then

; How to process this specific line?

; and if I wanna do some stuff on the 3 lines following "Page 1 of 11 </td>", what should i write here?

EndIf

Posted

Dim $aResults = StringRegExp(_IEBodyReadText($oIE), "(?i)page\s*(\d*)\s*of\s*(\d*)", 1)
If IsArray($aResults) Then
     Dim $iCurrentPage = $aResults[0]
     Dim $iTotalPages = $aResults[1]
EndIf

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

Posted

Dim $aResults = StringRegExp(_IEBodyReadText($oIE), "(?i)page\s*(\d*)\s*of\s*(\d*)", 1)
If IsArray($aResults) Then
     Dim $iCurrentPage = $aResults[0]
     Dim $iTotalPages = $aResults[1]
EndIf

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

I agree with you that _IETableGetCollection is a better implementation, but the problem is that I wanna process some specific contents/lines/tables, say, process the 2nd table following "Page 1 of 11 </td>", what codes should i write ?

_IETableGetCollection only allows us to retrieve the nth table explicitly but not depending on content. Thanks again :P

Posted

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

I have 2 questions:

1) innertext/html is always "said" to equivalent to outertext/html in Help. would you please exemplify their difference?

2) The following table data fooled my table writing program to multiple rows instead of single row,

<td class='img'>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR000719' >IPR000719</a>&nbsp; Protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR002290' >IPR002290</a>&nbsp; Serine/threonine protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR005543' >IPR005543</a>&nbsp; PASTA<br/>

</td>

Initially I guessed it's because <br/> but it seems it's not, I don't know what mess up the process now....

#include <IE.au3>
#include <Array.au3>
#include <File.au3>

$dir = "C:\"
$murl = "http://img.jgi.doe.gov/"

$qurl = "cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&"
$purl = "gene_oid="

$file = FileOpen("IMGID.txt", 0) 

; example IMGID.txt (without semicolons)

; 637094262
; 637094263
; 637094264

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf
   
$sFile = $dir & "IMG_DB.xls"
If FileExists($sFile) Then FileDelete($sFile)
$hFile = FileOpen($sFile, 1) ; 1 = append
$oIE = _IECreate ()
_IELoadWait ($oIE)

$i=0
; Read in lines of text until the EOF is reached
While 1
    $rsnum_entry = FileReadLine($file)
    If @error = -1 Then ExitLoop
    ;MsgBox(0, "Line read:", $rsnum_entry)
    
    _IENavigate ($oIE, $murl & $qurl & $purl & $rsnum_entry)
    _IELoadWait ($oIE)
    
    $oTable = _IETableGetCollection ($oIE, 1)   ;Gene Information
    $aTableData = _IETableWriteToArray ($oTable, True)
    ;_ArrayDisplay($aTableData)

    if ($i = 1) Then 
        $head = 0 
    Else
        $head = 1
    EndIf
    
     ;MsgBox(0, "Value of $i is:", $head)
     ;Exit

    For $y = 1 To UBound($aTableData,2) - 1
        For $x = $head To UBound($aTableData) - 1

            ;If $y > 1 Then FileWrite($hFile, ";")
            ;If $x > 1 Then 
            FileWrite($hFile, @Tab)
            $cell = StringReplace($aTableData[$x][$y],"<br/>","MULTIPLEITEMS")
            ;FileWrite($hFile, $aTableData[$x][$y])
            FileWrite($hFile, $cell)
        Next
        FileWrite($hFile,@crlf)
    Next
    
Wend

FileClose($file)
Posted

MSDN, this is the object page. Most, if not all of the object has/have it's own inner/outer-/text/html and probably it's not changing it's functionality from object to object but I didn't read them all. ;]

About the data formatting. I guess you want it like this:

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

Right? If so then it seems to me that it's already formatted as demonstrated, correct me where I'm wrong.

Posted

About the data formatting. I guess you want it like this:

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

Right? If so then it seems to me that it's already formatted as demonstrated, correct me where I'm wrong.

Thx, Authenticity, huhu, :D

In fact,

<td class='img'>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR000719' >IPR000719</a>&nbsp; Protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR002290' >IPR002290</a>&nbsp; Serine/threonine protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR005543' >IPR005543</a>&nbsp; PASTA<br/>

</td>

does NOT produce

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

but

Row1:Col1

Row2:Col2

Row3:Col3

instead... :o

Posted

<br/> messes up the retrieval process by line wrapping instead of maintaining the info in the same line.

It turns out the StringReplace fails to replace <br/> because it's a tag. So what else can I do to replace this tag?

#include <IE.au3>
#include <Array.au3>
#include <File.au3>

$sFile = "C:\IMG_DB.xls"
$hFile = FileOpen($sFile, 1) ; 1 = append
$oIE = _IECreate ()
_IELoadWait ($oIE)

_IENavigate ($oIE, "http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=637094262")
_IELoadWait ($oIE)
    
$oTable = _IETableGetCollection ($oIE, 1)   
$aTableData = _IETableWriteToArray ($oTable, True)
;_ArrayDisplay($aTableData)

For $y = 1 To UBound($aTableData,2) - 1
    For $x = 1 To UBound($aTableData) - 1

        FileWrite($hFile, @Tab)
        $cell = StringReplace($aTableData[$x][$y],"<br/>","MULTIPLEITEMS")
        FileWrite($hFile, $cell)
    Next
    FileWrite($hFile,@crlf)
Next

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...