Jump to content

retreive content delimited by tag


cwem
 Share

Recommended Posts

I'm looking for

"Error: Cannot locate database"

in the following HTML.

I'm able to obtain 12345678 by:

$lastInputObj = _IETagNameGetCollection($mainFrame, "INPUT", @extended - 1)

MsgBox(0, $lastInputObj.value, $lastInputObj.value)

But I just don't know how to retrieve the last <font color=red> </font>

Would anybody has any idea?

<tr><td ALIGN=CENTER VALIGN=CENTER ><font color=red>

Error: Cannot locate database

</font></td></tr></table>

<form action="http://www.google.com" method="post" name="iform" >

<INPUT TYPE=HIDDEN NAME=action VALUE=2>

<INPUT TYPE=HIDDEN NAME=src1 VALUE=12345678></form>

Link to comment
Share on other sites

If you just want to know if the text "Error: Cannot locate database" exists in the html then just use StringInStr.

But I don't know how to refer to this dynamically generated HTML page...

Link to comment
Share on other sites

  • 2 weeks later...

You can use _IEBodyReadHTML or _IEBodyReadText.

If StringInStr(_IEBodyReadText($o_object),"Error: Cannot locate database") Then
;Do Stuff...
EndIf
So if I wanna know the total pages that is 11 in the following text appearing somewhere in the HTML,

Page 1 of 11 </td>

how can I use StringInStr to extract "11"?

_IENavigate ($oIE, "www.google.com")

If StringInStr(_IEBodyReadText($oIE),"Page") Then

; How to process this specific line?

; and if I wanna do some stuff on the 3 lines following "Page 1 of 11 </td>", what should i write here?

EndIf

Link to comment
Share on other sites

Dim $aResults = StringRegExp(_IEBodyReadText($oIE), "(?i)page\s*(\d*)\s*of\s*(\d*)", 1)
If IsArray($aResults) Then
     Dim $iCurrentPage = $aResults[0]
     Dim $iTotalPages = $aResults[1]
EndIf

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

Link to comment
Share on other sites

Dim $aResults = StringRegExp(_IEBodyReadText($oIE), "(?i)page\s*(\d*)\s*of\s*(\d*)", 1)
If IsArray($aResults) Then
     Dim $iCurrentPage = $aResults[0]
     Dim $iTotalPages = $aResults[1]
EndIf

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

I agree with you that _IETableGetCollection is a better implementation, but the problem is that I wanna process some specific contents/lines/tables, say, process the 2nd table following "Page 1 of 11 </td>", what codes should i write ?

_IETableGetCollection only allows us to retrieve the nth table explicitly but not depending on content. Thanks again :P

Link to comment
Share on other sites

It's better to use the _IE* functions to get these like _IETableGetCollection or _IEPropertyGet for the Inner or Outer - Text or HTML and then using the string functions. Unless, of course, you're sure that no other "page" can appear in the body text.

I have 2 questions:

1) innertext/html is always "said" to equivalent to outertext/html in Help. would you please exemplify their difference?

2) The following table data fooled my table writing program to multiple rows instead of single row,

<td class='img'>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR000719' >IPR000719</a>&nbsp; Protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR002290' >IPR002290</a>&nbsp; Serine/threonine protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR005543' >IPR005543</a>&nbsp; PASTA<br/>

</td>

Initially I guessed it's because <br/> but it seems it's not, I don't know what mess up the process now....

#include <IE.au3>
#include <Array.au3>
#include <File.au3>

$dir = "C:\"
$murl = "http://img.jgi.doe.gov/"

$qurl = "cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&"
$purl = "gene_oid="

$file = FileOpen("IMGID.txt", 0) 

; example IMGID.txt (without semicolons)

; 637094262
; 637094263
; 637094264

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf
   
$sFile = $dir & "IMG_DB.xls"
If FileExists($sFile) Then FileDelete($sFile)
$hFile = FileOpen($sFile, 1) ; 1 = append
$oIE = _IECreate ()
_IELoadWait ($oIE)

$i=0
; Read in lines of text until the EOF is reached
While 1
    $rsnum_entry = FileReadLine($file)
    If @error = -1 Then ExitLoop
    ;MsgBox(0, "Line read:", $rsnum_entry)
    
    _IENavigate ($oIE, $murl & $qurl & $purl & $rsnum_entry)
    _IELoadWait ($oIE)
    
    $oTable = _IETableGetCollection ($oIE, 1)   ;Gene Information
    $aTableData = _IETableWriteToArray ($oTable, True)
    ;_ArrayDisplay($aTableData)

    if ($i = 1) Then 
        $head = 0 
    Else
        $head = 1
    EndIf
    
     ;MsgBox(0, "Value of $i is:", $head)
     ;Exit

    For $y = 1 To UBound($aTableData,2) - 1
        For $x = $head To UBound($aTableData) - 1

            ;If $y > 1 Then FileWrite($hFile, ";")
            ;If $x > 1 Then 
            FileWrite($hFile, @Tab)
            $cell = StringReplace($aTableData[$x][$y],"<br/>","MULTIPLEITEMS")
            ;FileWrite($hFile, $aTableData[$x][$y])
            FileWrite($hFile, $cell)
        Next
        FileWrite($hFile,@crlf)
    Next
    
Wend

FileClose($file)
Link to comment
Share on other sites

MSDN, this is the object page. Most, if not all of the object has/have it's own inner/outer-/text/html and probably it's not changing it's functionality from object to object but I didn't read them all. ;]

About the data formatting. I guess you want it like this:

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

Right? If so then it seems to me that it's already formatted as demonstrated, correct me where I'm wrong.

Link to comment
Share on other sites

About the data formatting. I guess you want it like this:

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

Right? If so then it seems to me that it's already formatted as demonstrated, correct me where I'm wrong.

Thx, Authenticity, huhu, :D

In fact,

<td class='img'>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR000719' >IPR000719</a>&nbsp; Protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR002290' >IPR002290</a>&nbsp; Serine/threonine protein kinase<br/>

- &nbsp; <a href='http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR005543' >IPR005543</a>&nbsp; PASTA<br/>

</td>

does NOT produce

Row1: Col1 @TAB Col2 @TAB Col3 @TAB
Row2: Col1 @TAB Col2 @TAB Col3 @TAB

but

Row1:Col1

Row2:Col2

Row3:Col3

instead... :o

Link to comment
Share on other sites

<br/> messes up the retrieval process by line wrapping instead of maintaining the info in the same line.

It turns out the StringReplace fails to replace <br/> because it's a tag. So what else can I do to replace this tag?

#include <IE.au3>
#include <Array.au3>
#include <File.au3>

$sFile = "C:\IMG_DB.xls"
$hFile = FileOpen($sFile, 1) ; 1 = append
$oIE = _IECreate ()
_IELoadWait ($oIE)

_IENavigate ($oIE, "http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=637094262")
_IELoadWait ($oIE)
    
$oTable = _IETableGetCollection ($oIE, 1)   
$aTableData = _IETableWriteToArray ($oTable, True)
;_ArrayDisplay($aTableData)

For $y = 1 To UBound($aTableData,2) - 1
    For $x = 1 To UBound($aTableData) - 1

        FileWrite($hFile, @Tab)
        $cell = StringReplace($aTableData[$x][$y],"<br/>","MULTIPLEITEMS")
        FileWrite($hFile, $cell)
    Next
    FileWrite($hFile,@crlf)
Next
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...