Sign in to follow this  
Followers 0
andomatic

Find string in HTML and take go back 2 links

10 posts in this topic

Hi,

I am trying to do some work with a web page. The need is to search through the page, and download some document each time I find a certain text string. The problem I am having is that the text I am searching for is not the link, the link I need is two back in the document. This is true for all occurrences. So if my text has 3 appearances in the doc, find first, back up 2 links, click, save html, go back to original page, find next occurrence of my text and repeat. As one can see from the included code, I am at a loss on successfully navigating the html. Any help is most appreciated ! My code is below and I have attached the htm file that I am trying to search. Thanks Very Much.

;AutoIt Version: 3.0
;Language:       English
;Platform:       Win x86
;Author:         Andy Folz

;Script Function:
;Scrape data from Pacer

;Globals
;#AutoIt3Wrapper_run_debug_mode=Y
#include-once
#include <IE.au3>
#include <File.au3>
#include <Sound.au3>
#include <Array.au3>


Dim $FromDate
Dim $Body
Dim $TestRes
Dim $UserName
Dim $Password
Dim $oForm
Dim $oLogin
Dim $oPassword
Dim $oSubmit
Dim $oSubmitPDF
Dim $oIE
Dim $aRecords
Dim $aFinal
Dim $res
Dim $oSubmitBK
Dim $oHistoryForm
Dim $sHTML
Dim $sSearchText
Dim $sFlag


;TEST VALUE*** CASE ID      
;TEST VALUE*** SSN          

;Launch IE and hit the site
$oIE = _IECreate ("https://pacer.login.uscourts.gov/cgi-bin/login.pl?court_id=00pcl")
WinSetState("Public Access to Court Electronic Records","",@SW_MAXIMIZE)

$sSearchText = "Chapter 13 Plan"

;Login to the system
$UserName = ""
$Password = ""
$oForm = _IEFormGetCollection ($oIE, 0)
$oLogin = _IEFormElementGetCollection ($oForm, 1)
$oPassword = _IEFormElementGetCollection ($oForm, 2)
$oSubmit = _IEFormElementGetCollection ($oForm, 4)
_IEFormElementSetValue($oLogin, $UserName)
_IEFormElementSetValue($oPassword, $password)
_IEAction ($oSubmit, "click")
_IELoadWait ($oIE)

;Navigate to the BK search
_IELinkClickByText($oIE,"Bankruptcy")

;Build the data set
Dim $aRecords
_FileReadToArray("c:\AutoItSrc\DS\testDS.csv",$aRecords)

For $x = 1 to $aRecords[0]
    ;Remove the white space
    $res = StringStripWS($aRecords[$x],7)
    $aFinal = StringSplit($res,",",2); this is split as $aFinal[0] is the SSN and $aFinal[1] is the Case#
    ;Create a folder to store the result set
    DirCreate("c:\result\" & $aFinal[0]& "_" & $aFinal[1])

    ;Search for the case
    $oBkForm = _IEFormGetCollection ($oIE, 0)
    $oCaseID = _IEFormElementGetCollection ($oBKForm, 5)
    _IEFormElementSetValue($oCaseID, $aFinal[1])

    $oSSN = _IEFormElementGetCollection ($oBKForm, 22)
    _IEFormElementSetValue($oSSN, $aFinal[0])


    ;Submit the form
    $oSubmitBK = _IEFormElementGetCollection ($oBKForm, 24)
    _IEAction ($oSubmitBK, "click")
    _IELoadWait ($oIE)
    Sleep(1000)

    ConsoleWrite("SSN = " & $aFinal[0] & " | " & "CaseID = " & $aFinal[1] & " | " & "# Processed this run: " & $x & @CRLF)
    ; Check for No Cases
    $Body = _IEBodyReadText($oIE)
    ;msgbox(0,"Body Text",$Body)
    $TestRes = StringInStr($Body,"No Records Found")
    
    If $TestRes > 0 Then
        ;No Record Found, so save it and back it up for a new search
        Sleep(100)
        Send("!fa")
        Sleep(500)
        Send("    C:\result\" & $aFinal[0]& "_" & $aFinal[1] & "\" & $aFinal[0]& "_" & $aFinal[1] & "_Summary.htm")
        Sleep(200)
        Send("!s")
        Sleep(1000)
        _IEAction($oIE,"back")
        Sleep(1500)
    Else
        ;Record is OK
        ;Download the HTM file
        _IELinkClickByIndex($oIE, 5)
        _IELinkClickByText($oIE,"History / Documents")

        ; Need to accomdate different format of link...
        If @error Then
            _IELinkClickByText($oIE,"History/Documents")
        EndIf

        _IELoadWait($oIE,2000)
        ;Click to pull the record.
        $oHistoryForm = _IEFormGetCollection ($oIE, 0)
        $oSubmitPDF = _IEFormElementGetCollection($oHistoryForm,4)
        _IEAction ($oSubmitPDF, "click")
        _IELoadWait ($oIE)
        Sleep(1000)
        ;Find the all Chapter 13 in description and DL each PDF
        $sHTML = _IEDocReadHTML($oIE)

        $sFlag = "Set"
        $intCtr = 1
            Do
;~              $res = StringSplit($sHTML,@CRLF)
;~              ;_ArrayDisplay ($res,"Result of HTML Parse")

;~              $iIndex = _ArraySearch($res, $sSearchText, 0, 0, 0, 1)
;~              $iIndex = $iIndex -17
;~              msgbox (0,"res",$iIndex)
;~              If @error Then
;~                  MsgBox(0, "Not Found", '"' & $sSearchText & '" was not found in the array.')
;~              Else
;~                  MsgBox(0, "Found", $iIndex[171])
;~              EndIf


;~              Exit
                $oLinks = _IELinkGetCollection ($oIE)
                $iNumLinks = @extended
                MsgBox(0, "Link Info", $iNumLinks & " links found")
                For $oLink In $oLinks
                    MsgBox(0, "Link Info", $oLink.href)
                Next
            Until $res = 0
            msgbox(0,"res","Done")
            Exit

        Sleep(1000)
        Send("!fa")
        Sleep(500)
        Send("C:\result\" & $aFinal[0]& "_" & $aFinal[1] & "\" & $aFinal[0]& "_" & $aFinal[1] & "_Summary.htm")
        sleep(1000)
        Send("!s")
        Sleep(2000)

        msgbox (0,"Histroy Page","Done")
        Exit

        ;Setup browser for new search, back 3 pgs.
        _IEAction($oIE,"back")
        Sleep(1500)

        _IEAction($oIE,"back")
        Sleep(1500)

        _IEAction($oIE,"back")
        Sleep(1500)
    EndIf

Next
msgbox(0,"Info","Run Complete")
Exit

sample.htm

Share this post


Link to post
Share on other sites



I'm pretty lousey with IE and dont think I fully understand your question, in particular you look for text and you want 'two back' from it.

If what you are searching for is a link, then Maybe you should use _IELinkGetCollection(), then loop through and search for the string in the links, if its found then click the link index -2.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

The Document Object model is hierarchical... you can select objects and then drill into them to find what you want.

THe table you are interested in on the page is not the first table in the page, but if it were, you would select it like this>

$oTable = _IETableGetCollection($oIE, 0) ; select first table, index 0

Then you can get a collection of the rows with

$oTRs = _IETagnemaGetCollection($oTable, "TR")

Then a collection of the cells in the row with

$oTDs = _IETagnameGetCollection($oTR, "TD")

Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGet

when you find the text you want, you know that the first cell in the row has the link you want, so you get the link this way

$oTD = _IETagnameGetCollection($oTR, "TD", 0)

$oLink = _IETagnameGetCollection($oTD, "a", 0)

then click on it with

_IEAction($oLink, "click")

that's the roadmap... see what you can do.

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

The Document Object model is hierarchical... you can select objects and then drill into them to find what you want.

THe table you are interested in on the page is not the first table in the page, but if it were, you would select it like this>

$oTable = _IETableGetCollection($oIE, 0) ; select first table, index 0

Then you can get a collection of the rows with

$oTRs = _IETagnemaGetCollection($oTable, "TR")

Then a collection of the cells in the row with

$oTDs = _IETagnameGetCollection($oTR, "TD")

Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGet

when you find the text you want, you know that the first cell in the row has the link you want, so you get the link this way

$oTD = _IETagnameGetCollection($oTR, "TD", 0)

$oLink = _IETagnameGetCollection($oTD, "a", 0)

then click on it with

_IEAction($oLink, "click")

that's the roadmap... see what you can do.

Dale

Dale, Thanks very much for this. I am going to take a crack on Monday and will let you know how I make out. I'd toyed with saving the body text to a txt file, getting the value and passing it back. Your solution seems MUCH more elegant and I am excited to give it a try!

Andy

Share this post


Link to post
Share on other sites

Dale, Thanks very much for this. I am going to take a crack on Monday and will let you know how I make out. I'd toyed with saving the body text to a txt file, getting the value and passing it back. Your solution seems MUCH more elegant and I am excited to give it a try!

Andy

Dale,

I've setup a test using the information you provided and I am getting stuck now on only one piece. It seems (to my rather inexperienced self in any case) that the complete row is being returned, rather than cell by cell? I am thinking I can get what I need by taking the first few chars of the row and storing them in a variable, then formating and passing the text to _IEClickLinkByText. But I feel like I am missing something and should not have to do that. It feels like passing the "a" tag is generating an unrecognized response? Any more light that you can shed would be great. There is a test file attached. To run the test drop the attached .htm file on the root of C. Here is my code... Thanks again for all of your help!

; #AutoIt3Wrapper_run_debug_mode=Y
 #include <IE.au3>
Dim $oIE
Dim $sHTML
Dim $sSearchText
Dim $oTable
Dim $oTR
Dim $oTD
Dim $sTblCont
Dim $oLink
;Launch IE
$oIE = _IECreate ()
_IENavigate($oIE, "c:\test.htm")

WinSetState("Test","",@SW_MAXIMIZE)

$sSearchText = "Chapter 13 Plan"

;Find the all Chapter 13 in description and DL each PDF
;Grab a reference to the table...
$oTable = _IETableGetCollection($oIE,0) ; select first table, index 1
$oTR = _IETagnameGetCollection($oTable, "TR")
$oTD = _IETagnameGetCollection($oTR, "TD")


For $oTD in $oTR ;each cell in the current row
    $sTblCont = _IEPropertyGet($oTD, "innertext")
    If StringInStr($sTblCont,$sSearchText) Then
        msgbox(0,"Contents",$sTblCont)
        $oTD = _IETagnameGetCollection($oTR, "TD", 0)
        $oLink = _IETagnameGetCollection($oTD, "a")
        ConsoleWrite ("Link Ref: " & $oLink & @CRLF)
        _IEAction($oLink, "click")
    EndIf
Next

msgbox (0,"Status","Complete")

_IEQuit($oIE)

Exit

test.htm

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Then you can get a collection of the rows with

$oTRs = _IETagnemaGetCollection($oTable, "TR")

Then a collection of the cells in the row with

$oTDs = _IETagnameGetCollection($oTR, "TD")

Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGet

Here's the logic you're missing:

$oTRs = _IETagnemaGetCollection($oTable, "TR")
For $oTR In $oTRs
    $oTDs = _IETagnameGetCollection($oTR, "TD")
    For $oTD In $oTDs
        ...
    Next
Next

Dale

Edited by DaleHohm

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

New function Dale?

$oTRs = _IETagnemaGetCollection($oTable, "TR")

I'm guessing you meant

$oTRs = _IETagnameGetCollection($oTable, "TR")

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Yes, that was a typo I proliferated from my original roadmap...

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Dale,

I can not thank you enough for this help. You took an unmanageable task and transformed it instantly as well as taught me some skills along the way. Thanks so much for your time and help, this works perfectly !

Andy

Share this post


Link to post
Share on other sites

Super. I hope other newbies are watching... this is the way it is supposed to work. Good job antomatic.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0