NassauSky Posted December 21, 2019 Share Posted December 21, 2019 (edited) Hi, Working on getting the latest chrome driver from https://chromedriver.storage.googleapis.com/index.html but the page is pure javascript that gets generated after the user visits it. Standard scraping isn't working. For example: #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) $oRows = _IETagnameGetCollection($oIE, "tr") ConsoleWrite("oRows:" & $oRows.Length & @CRLF) Results in the following output: oRows:0 >Exit code: 0 Time: 0.5149 Thanks for any ideas and help 🙂 Edited December 21, 2019 by NassauSky Added resulting output Link to comment Share on other sites More sharing options...
NassauSky Posted December 21, 2019 Author Share Posted December 21, 2019 (edited) So far this. Seems maybe I had to add a sleep before scraping but this code works slow as molasses #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(2000) ; So far sleeping 2 seconds here gets results $oRows = _IETagnameGetCollection($oIE, "tr") ConsoleWrite("oRows:" & $oRows.Length & @CRLF) For $oRow in $oRows $innerString = '' If StringLen($oRow.innerText) > 0 And $oRow.innerText <> "-" Then $aChromeVer = StringSplit($oRow.innerText,".") If StringIsDigit ( $aChromeVer[1] ) And Number($aChromeVer[1]) > 2 Then ; ConsoleWrite("->" & $aChromeVer[1] & @CRLF) EndIf $oDatas = _IETagnameGetCollection($oRow, "td") ;<--this slowed up the process looking for the subset of data EndIf Next _IEQuit($oIE) Takes so much longer than loading the page to extract some data. Is there a way to speed this up? EDIT UPDATE: I found out that the next step $oData = _IETagnameGetCollection was slowing down the scraping. Edited December 21, 2019 by NassauSky Removed code not yet necessary to operate Link to comment Share on other sites More sharing options...
Danp2 Posted December 21, 2019 Share Posted December 21, 2019 @NassauSky Did you review the code posted by @CYCho? P.S. I'm already working on turning this into a helper function that works with Chrome and Firefox. Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
NassauSky Posted December 21, 2019 Author Share Posted December 21, 2019 @Danp2 Excellent! Don't know how I missed that. Thanks! Link to comment Share on other sites More sharing options...
Nine Posted December 21, 2019 Share Posted December 21, 2019 @NassauSky or you could have used table functions like this : #include <IE.au3> #include <Array.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(1000) ; So far sleeping 1 second here gets results $oTable = _IETableGetCollection($oIE,0) $aTable = _IETableWriteToArray($oTable, True) _ArrayDisplay ($aTable) _IEQuit($oIE) Instant fast “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
NassauSky Posted December 21, 2019 Author Share Posted December 21, 2019 Thanks @Nine That makes sense too. Link to comment Share on other sites More sharing options...
Subz Posted December 22, 2019 Share Posted December 22, 2019 (edited) You could also just use childNodes.item(index) as well, which is useful for getting attributes. #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(2000) ; So far sleeping 2 seconds here gets results $oRows = _IETagNameGetCollection($oIE, "tr") For $oRow In $oRows If $oRow.childNodes.Length = 5 Then If Int($oRow.childNodes.item(1).innerText) > 2 Then ConsoleWrite(Int($oRow.childNodes.item(1).innerText) & " - " & $oRow.childNodes.item(1).childNodes.item(0).href & @CRLF) EndIf EndIf Next Edited December 23, 2019 by Subz Link to comment Share on other sites More sharing options...
NassauSky Posted December 22, 2019 Author Share Posted December 22, 2019 Not bad, yes I like. Thanks @Subz Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now