andomatic Posted December 3, 2010 Share Posted December 3, 2010 Hi, I am trying to do some work with a web page. The need is to search through the page, and download some document each time I find a certain text string. The problem I am having is that the text I am searching for is not the link, the link I need is two back in the document. This is true for all occurrences. So if my text has 3 appearances in the doc, find first, back up 2 links, click, save html, go back to original page, find next occurrence of my text and repeat. As one can see from the included code, I am at a loss on successfully navigating the html. Any help is most appreciated ! My code is below and I have attached the htm file that I am trying to search. Thanks Very Much. expandcollapse popup;AutoIt Version: 3.0 ;Language: English ;Platform: Win x86 ;Author: Andy Folz ;Script Function: ;Scrape data from Pacer ;Globals ;#AutoIt3Wrapper_run_debug_mode=Y #include-once #include <IE.au3> #include <File.au3> #include <Sound.au3> #include <Array.au3> Dim $FromDate Dim $Body Dim $TestRes Dim $UserName Dim $Password Dim $oForm Dim $oLogin Dim $oPassword Dim $oSubmit Dim $oSubmitPDF Dim $oIE Dim $aRecords Dim $aFinal Dim $res Dim $oSubmitBK Dim $oHistoryForm Dim $sHTML Dim $sSearchText Dim $sFlag ;TEST VALUE*** CASE ID ;TEST VALUE*** SSN ;Launch IE and hit the site $oIE = _IECreate ("https://pacer.login.uscourts.gov/cgi-bin/login.pl?court_id=00pcl") WinSetState("Public Access to Court Electronic Records","",@SW_MAXIMIZE) $sSearchText = "Chapter 13 Plan" ;Login to the system $UserName = "" $Password = "" $oForm = _IEFormGetCollection ($oIE, 0) $oLogin = _IEFormElementGetCollection ($oForm, 1) $oPassword = _IEFormElementGetCollection ($oForm, 2) $oSubmit = _IEFormElementGetCollection ($oForm, 4) _IEFormElementSetValue($oLogin, $UserName) _IEFormElementSetValue($oPassword, $password) _IEAction ($oSubmit, "click") _IELoadWait ($oIE) ;Navigate to the BK search _IELinkClickByText($oIE,"Bankruptcy") ;Build the data set Dim $aRecords _FileReadToArray("c:\AutoItSrc\DS\testDS.csv",$aRecords) For $x = 1 to $aRecords[0] ;Remove the white space $res = StringStripWS($aRecords[$x],7) $aFinal = StringSplit($res,",",2); this is split as $aFinal[0] is the SSN and $aFinal[1] is the Case# ;Create a folder to store the result set DirCreate("c:\result\" & $aFinal[0]& "_" & $aFinal[1]) ;Search for the case $oBkForm = _IEFormGetCollection ($oIE, 0) $oCaseID = _IEFormElementGetCollection ($oBKForm, 5) _IEFormElementSetValue($oCaseID, $aFinal[1]) $oSSN = _IEFormElementGetCollection ($oBKForm, 22) _IEFormElementSetValue($oSSN, $aFinal[0]) ;Submit the form $oSubmitBK = _IEFormElementGetCollection ($oBKForm, 24) _IEAction ($oSubmitBK, "click") _IELoadWait ($oIE) Sleep(1000) ConsoleWrite("SSN = " & $aFinal[0] & " | " & "CaseID = " & $aFinal[1] & " | " & "# Processed this run: " & $x & @CRLF) ; Check for No Cases $Body = _IEBodyReadText($oIE) ;msgbox(0,"Body Text",$Body) $TestRes = StringInStr($Body,"No Records Found") If $TestRes > 0 Then ;No Record Found, so save it and back it up for a new search Sleep(100) Send("!fa") Sleep(500) Send(" C:\result\" & $aFinal[0]& "_" & $aFinal[1] & "\" & $aFinal[0]& "_" & $aFinal[1] & "_Summary.htm") Sleep(200) Send("!s") Sleep(1000) _IEAction($oIE,"back") Sleep(1500) Else ;Record is OK ;Download the HTM file _IELinkClickByIndex($oIE, 5) _IELinkClickByText($oIE,"History / Documents") ; Need to accomdate different format of link... If @error Then _IELinkClickByText($oIE,"History/Documents") EndIf _IELoadWait($oIE,2000) ;Click to pull the record. $oHistoryForm = _IEFormGetCollection ($oIE, 0) $oSubmitPDF = _IEFormElementGetCollection($oHistoryForm,4) _IEAction ($oSubmitPDF, "click") _IELoadWait ($oIE) Sleep(1000) ;Find the all Chapter 13 in description and DL each PDF $sHTML = _IEDocReadHTML($oIE) $sFlag = "Set" $intCtr = 1 Do ;~ $res = StringSplit($sHTML,@CRLF) ;~ ;_ArrayDisplay ($res,"Result of HTML Parse") ;~ $iIndex = _ArraySearch($res, $sSearchText, 0, 0, 0, 1) ;~ $iIndex = $iIndex -17 ;~ msgbox (0,"res",$iIndex) ;~ If @error Then ;~ MsgBox(0, "Not Found", '"' & $sSearchText & '" was not found in the array.') ;~ Else ;~ MsgBox(0, "Found", $iIndex[171]) ;~ EndIf ;~ Exit $oLinks = _IELinkGetCollection ($oIE) $iNumLinks = @extended MsgBox(0, "Link Info", $iNumLinks & " links found") For $oLink In $oLinks MsgBox(0, "Link Info", $oLink.href) Next Until $res = 0 msgbox(0,"res","Done") Exit Sleep(1000) Send("!fa") Sleep(500) Send("C:\result\" & $aFinal[0]& "_" & $aFinal[1] & "\" & $aFinal[0]& "_" & $aFinal[1] & "_Summary.htm") sleep(1000) Send("!s") Sleep(2000) msgbox (0,"Histroy Page","Done") Exit ;Setup browser for new search, back 3 pgs. _IEAction($oIE,"back") Sleep(1500) _IEAction($oIE,"back") Sleep(1500) _IEAction($oIE,"back") Sleep(1500) EndIf Next msgbox(0,"Info","Run Complete") Exitsample.htm Link to comment Share on other sites More sharing options...
JohnOne Posted December 4, 2010 Share Posted December 4, 2010 I'm pretty lousey with IE and dont think I fully understand your question, in particular you look for text and you want 'two back' from it. If what you are searching for is a link, then Maybe you should use _IELinkGetCollection(), then loop through and search for the string in the links, if its found then click the link index -2. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans. Link to comment Share on other sites More sharing options...
DaleHohm Posted December 4, 2010 Share Posted December 4, 2010 (edited) The Document Object model is hierarchical... you can select objects and then drill into them to find what you want. THe table you are interested in on the page is not the first table in the page, but if it were, you would select it like this> $oTable = _IETableGetCollection($oIE, 0) ; select first table, index 0 Then you can get a collection of the rows with $oTRs = _IETagnemaGetCollection($oTable, "TR") Then a collection of the cells in the row with $oTDs = _IETagnameGetCollection($oTR, "TD") Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGet when you find the text you want, you know that the first cell in the row has the link you want, so you get the link this way $oTD = _IETagnameGetCollection($oTR, "TD", 0) $oLink = _IETagnameGetCollection($oTD, "a", 0) then click on it with _IEAction($oLink, "click") that's the roadmap... see what you can do. Dale Edited December 4, 2010 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
andomatic Posted December 5, 2010 Author Share Posted December 5, 2010 The Document Object model is hierarchical... you can select objects and then drill into them to find what you want.THe table you are interested in on the page is not the first table in the page, but if it were, you would select it like this>$oTable = _IETableGetCollection($oIE, 0) ; select first table, index 0Then you can get a collection of the rows with$oTRs = _IETagnemaGetCollection($oTable, "TR")Then a collection of the cells in the row with$oTDs = _IETagnameGetCollection($oTR, "TD")Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGetwhen you find the text you want, you know that the first cell in the row has the link you want, so you get the link this way$oTD = _IETagnameGetCollection($oTR, "TD", 0)$oLink = _IETagnameGetCollection($oTD, "a", 0)then click on it with _IEAction($oLink, "click")that's the roadmap... see what you can do.DaleDale, Thanks very much for this. I am going to take a crack on Monday and will let you know how I make out. I'd toyed with saving the body text to a txt file, getting the value and passing it back. Your solution seems MUCH more elegant and I am excited to give it a try!Andy Link to comment Share on other sites More sharing options...
andomatic Posted December 6, 2010 Author Share Posted December 6, 2010 Dale, Thanks very much for this. I am going to take a crack on Monday and will let you know how I make out. I'd toyed with saving the body text to a txt file, getting the value and passing it back. Your solution seems MUCH more elegant and I am excited to give it a try! Andy Dale, I've setup a test using the information you provided and I am getting stuck now on only one piece. It seems (to my rather inexperienced self in any case) that the complete row is being returned, rather than cell by cell? I am thinking I can get what I need by taking the first few chars of the row and storing them in a variable, then formating and passing the text to _IEClickLinkByText. But I feel like I am missing something and should not have to do that. It feels like passing the "a" tag is generating an unrecognized response? Any more light that you can shed would be great. There is a test file attached. To run the test drop the attached .htm file on the root of C. Here is my code... Thanks again for all of your help! expandcollapse popup; #AutoIt3Wrapper_run_debug_mode=Y #include <IE.au3> Dim $oIE Dim $sHTML Dim $sSearchText Dim $oTable Dim $oTR Dim $oTD Dim $sTblCont Dim $oLink ;Launch IE $oIE = _IECreate () _IENavigate($oIE, "c:\test.htm") WinSetState("Test","",@SW_MAXIMIZE) $sSearchText = "Chapter 13 Plan" ;Find the all Chapter 13 in description and DL each PDF ;Grab a reference to the table... $oTable = _IETableGetCollection($oIE,0) ; select first table, index 1 $oTR = _IETagnameGetCollection($oTable, "TR") $oTD = _IETagnameGetCollection($oTR, "TD") For $oTD in $oTR ;each cell in the current row $sTblCont = _IEPropertyGet($oTD, "innertext") If StringInStr($sTblCont,$sSearchText) Then msgbox(0,"Contents",$sTblCont) $oTD = _IETagnameGetCollection($oTR, "TD", 0) $oLink = _IETagnameGetCollection($oTD, "a") ConsoleWrite ("Link Ref: " & $oLink & @CRLF) _IEAction($oLink, "click") EndIf Next msgbox (0,"Status","Complete") _IEQuit($oIE) Exittest.htm Link to comment Share on other sites More sharing options...
DaleHohm Posted December 6, 2010 Share Posted December 6, 2010 (edited) Then you can get a collection of the rows with $oTRs = _IETagnemaGetCollection($oTable, "TR") Then a collection of the cells in the row with $oTDs = _IETagnameGetCollection($oTR, "TD") Then loop through the TD's with a FOR IN NEXT loop and examine the innertext of each with _IEPropertyGet Here's the logic you're missing: $oTRs = _IETagnemaGetCollection($oTable, "TR") For $oTR In $oTRs $oTDs = _IETagnameGetCollection($oTR, "TD") For $oTD In $oTDs ... Next Next Dale Edited December 6, 2010 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
GEOSoft Posted December 6, 2010 Share Posted December 6, 2010 New function Dale? $oTRs = _IETagnemaGetCollection($oTable, "TR")I'm guessing you meant $oTRs = _IETagnameGetCollection($oTable, "TR") George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
DaleHohm Posted December 6, 2010 Share Posted December 6, 2010 Yes, that was a typo I proliferated from my original roadmap... Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
andomatic Posted December 6, 2010 Author Share Posted December 6, 2010 Dale, I can not thank you enough for this help. You took an unmanageable task and transformed it instantly as well as taught me some skills along the way. Thanks so much for your time and help, this works perfectly ! Andy Link to comment Share on other sites More sharing options...
DaleHohm Posted December 7, 2010 Share Posted December 7, 2010 Super. I hope other newbies are watching... this is the way it is supposed to work. Good job antomatic. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now