Imbuter2000 Posted March 18, 2012 Share Posted March 18, 2012 I want to extract the titles of the Google search results in an array. I could solve it using a regular expression:I open for example http://www.google.it/#q=foobar, I see with the DebugBar that the code for example of the title "foobar2000 - Wikipedia" is:"<a class="l" onmousedown="return rwt(this,'','','','6','AFQjCNG5l1JlHEfLHSE1yqxjOCBlWP5Z4A','','0CFoQFjAF',null,event)" href="http://it.wikipedia.org/wiki/Foobar2000"><em>foobar2000</em> - Wikipedia</a>"so I use a code like this:$bodyhtml = _IEBodyReadHTML($oIE)$matches = StringRegExp($bodyhtml,'(?s:<a class="l"[^>]* href="[^"]+">((?:<em>|)[^<]*(?:</em>|)[^<]*)</a>)',3) ; I could do it even better...How can I solve it with IE UDF in a better way, possibly without using a regular expression? Link to comment Share on other sites More sharing options...
DaleHohm Posted March 20, 2012 Share Posted March 20, 2012 Explain more precisely what you are trying to do. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
Imbuter2000 Posted March 20, 2012 Author Share Posted March 20, 2012 Explain more precisely what you are trying to do.DaleI want to track the position of a given result (title) in the search results page. Google is just an example, I need it for various other sites... Link to comment Share on other sites More sharing options...
DaleHohm Posted March 21, 2012 Share Posted March 21, 2012 Apparently, you have no clue what "precise" means. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
GMK Posted March 21, 2012 Share Posted March 21, 2012 (edited) #include <Array.au3> #include <IE.au3> Local $aTitles[1], $oIE, $oForm, $oQuery, $oLinks, $iCount, $sLinkHTML, $sLinkText $oIE = _IECreate("http://www.google.it/", 0, 1) ; Load Google $oForm = _IEFormGetObjByName($oIE, "gbqf") ; Get Form object $oQuery = _IEFormElementGetObjByName($oForm, "q") ; Get query text box _IEFormElementSetValue($oQuery, "foobar") ; Populate query text box _IEFormSubmit($oForm, 1) ; Submit form $oLinks = _IELinkGetCollection($oIE) ; Get collection of links If IsObj($oLinks) Then ; Check to make sure collection of links is an object $iCount = 0 ; Set count to 0 For $oLink In $oLinks ; Loop through all links $sLinkHTML = _IEPropertyGet($oLink, "innerhtml") ; Get inner html of link $sLinkText = _IEPropertyGet($oLink, "innertext") ; Get inner text of link If StringInStr($sLinkHTML, "<em>") Then ; Since all the search results contain <EM>, we'll check our links for that $iCount += 1 ; Add one to count ReDim $aTitles[$iCount] ; Add to array rows $aTitles[$iCount - 1] = $sLinkText ; Set array row EndIf Next EndIf _ArrayDisplay($aTitles, "Titles") ; Display array Edit: This doesn't work so well if <EM> isn't used, but in many cases it does anyway. Edited March 21, 2012 by GMK Link to comment Share on other sites More sharing options...
Imbuter2000 Posted March 22, 2012 Author Share Posted March 22, 2012 #include #include Local $aTitles[1], $oIE, $oForm, $oQuery, $oLinks, $iCount, $sLinkHTML, $sLinkText $oIE = _IECreate("http://www.google.it/", 0, 1) ; Load Google $oForm = _IEFormGetObjByName($oIE, "gbqf") ; Get Form object $oQuery = _IEFormElementGetObjByName($oForm, "q") ; Get query text box _IEFormElementSetValue($oQuery, "foobar") ; Populate query text box _IEFormSubmit($oForm, 1) ; Submit form $oLinks = _IELinkGetCollection($oIE) ; Get collection of links If IsObj($oLinks) Then ; Check to make sure collection of links is an object $iCount = 0 ; Set count to 0 For $oLink In $oLinks ; Loop through all links $sLinkHTML = _IEPropertyGet($oLink, "innerhtml") ; Get inner html of link $sLinkText = _IEPropertyGet($oLink, "innertext") ; Get inner text of link If StringInStr($sLinkHTML, "[i]") Then ; Since all the search results contain [i], we'll check our links for that $iCount += 1 ; Add one to count ReDim $aTitles[$iCount] ; Add to array rows $aTitles[$iCount - 1] = $sLinkText ; Set array row EndIf Next EndIf _ArrayDisplay($aTitles, "Titles") ; Display array Edit: This doesn't work so well if isn't used, but in many cases it does anyway. I see that Google uses the tag <em> to emphasize the keyword in the search results so where the keyword is not in the title (in the first 10 pages of "foobar" you'll find some cases) your solutions fails to detect the result. Is the solution with the regular expression the best solution then for this example? Link to comment Share on other sites More sharing options...
GMK Posted March 23, 2012 Share Posted March 23, 2012 How about this? #include <Array.au3> #include <IE.au3> Local $aTitles[1][2], $oIE, $oForm, $oQuery, $oLinks, $iCount, $sLinkHTML, $sLinkText $oIE = _IECreate("http://www.google.it/", 0, 1) ; Load Google $oForm = _IEFormGetObjByName($oIE, "gbqf") ; Get Form object $oQuery = _IEFormElementGetObjByName($oForm, "q") ; Get query text box _IEFormElementSetValue($oQuery, "foobar") ; Populate query text box _IEFormSubmit($oForm, 1) ; Submit form $oLinks = _IELinkGetCollection($oIE) ; Get collection of links If IsObj($oLinks) Then ; Check to make sure collection of links is an object $iCount = 0 ; Set count to 0 For $oLink In $oLinks ; Loop through all links $sLinkHREF = $oLink.href $sLinkText = _IEPropertyGet($oLink, "innertext") ; Get inner text of link If Not StringInStr($sLinkHREF, "google") And Not StringInStr($sLinkHREF, "javascript") Then ; All non-Google links $iCount += 1 ; Add one to count ReDim $aTitles[$iCount][2] ; Add to array rows $aTitles[$iCount - 1][0] = $sLinkText ; Set text $aTitles[$iCount - 1][1] = $sLinkHREF ; Set href EndIf Next EndIf For $i = 1 To 2 _ArrayDelete($aTitles, 0) ; Delete first two non-Google links (YouTube and Blogger) Next _ArrayDisplay($aTitles, "Titles") ; Display array Link to comment Share on other sites More sharing options...
Imbuter2000 Posted March 24, 2012 Author Share Posted March 24, 2012 How about this? #include <Array.au3> #include <IE.au3> Local $aTitles[1][2], $oIE, $oForm, $oQuery, $oLinks, $iCount, $sLinkHTML, $sLinkText $oIE = _IECreate("http://www.google.it/", 0, 1) ; Load Google $oForm = _IEFormGetObjByName($oIE, "gbqf") ; Get Form object $oQuery = _IEFormElementGetObjByName($oForm, "q") ; Get query text box _IEFormElementSetValue($oQuery, "foobar") ; Populate query text box _IEFormSubmit($oForm, 1) ; Submit form $oLinks = _IELinkGetCollection($oIE) ; Get collection of links If IsObj($oLinks) Then ; Check to make sure collection of links is an object $iCount = 0 ; Set count to 0 For $oLink In $oLinks ; Loop through all links $sLinkHREF = $oLink.href $sLinkText = _IEPropertyGet($oLink, "innertext") ; Get inner text of link If Not StringInStr($sLinkHREF, "google") And Not StringInStr($sLinkHREF, "javascript") Then ; All non-Google links $iCount += 1 ; Add one to count ReDim $aTitles[$iCount][2] ; Add to array rows $aTitles[$iCount - 1][0] = $sLinkText ; Set text $aTitles[$iCount - 1][1] = $sLinkHREF ; Set href EndIf Next EndIf For $i = 1 To 2 _ArrayDelete($aTitles, 0) ; Delete first two non-Google links (YouTube and Blogger) Next _ArrayDisplay($aTitles, "Titles") ; Display array It wrongly includes adv links too... so... is regular expression still the best option for this example? Link to comment Share on other sites More sharing options...
GMK Posted March 26, 2012 Share Posted March 26, 2012 Hmmm...if it works, go with RegExp...but if I find another option, I'll post it here. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now