NassauSky Posted December 21, 2019 Posted December 21, 2019 (edited) Hi, Working on getting the latest chrome driver from https://chromedriver.storage.googleapis.com/index.html but the page is pure javascript that gets generated after the user visits it. Standard scraping isn't working. For example: #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) $oRows = _IETagnameGetCollection($oIE, "tr") ConsoleWrite("oRows:" & $oRows.Length & @CRLF) Results in the following output: oRows:0 >Exit code: 0 Time: 0.5149 Thanks for any ideas and help 🙂 Edited December 21, 2019 by NassauSky Added resulting output
NassauSky Posted December 21, 2019 Author Posted December 21, 2019 (edited) So far this. Seems maybe I had to add a sleep before scraping but this code works slow as molasses #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(2000) ; So far sleeping 2 seconds here gets results $oRows = _IETagnameGetCollection($oIE, "tr") ConsoleWrite("oRows:" & $oRows.Length & @CRLF) For $oRow in $oRows $innerString = '' If StringLen($oRow.innerText) > 0 And $oRow.innerText <> "-" Then $aChromeVer = StringSplit($oRow.innerText,".") If StringIsDigit ( $aChromeVer[1] ) And Number($aChromeVer[1]) > 2 Then ; ConsoleWrite("->" & $aChromeVer[1] & @CRLF) EndIf $oDatas = _IETagnameGetCollection($oRow, "td") ;<--this slowed up the process looking for the subset of data EndIf Next _IEQuit($oIE) Takes so much longer than loading the page to extract some data. Is there a way to speed this up? EDIT UPDATE: I found out that the next step $oData = _IETagnameGetCollection was slowing down the scraping. Edited December 21, 2019 by NassauSky Removed code not yet necessary to operate
Danp2 Posted December 21, 2019 Posted December 21, 2019 @NassauSky Did you review the code posted by @CYCho? P.S. I'm already working on turning this into a helper function that works with Chrome and Firefox. Latest Webdriver UDF Release Webdriver Wiki FAQs
NassauSky Posted December 21, 2019 Author Posted December 21, 2019 @Danp2 Excellent! Don't know how I missed that. Thanks!
Nine Posted December 21, 2019 Posted December 21, 2019 @NassauSky or you could have used table functions like this : #include <IE.au3> #include <Array.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(1000) ; So far sleeping 1 second here gets results $oTable = _IETableGetCollection($oIE,0) $aTable = _IETableWriteToArray($oTable, True) _ArrayDisplay ($aTable) _IEQuit($oIE) Instant fast “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
NassauSky Posted December 21, 2019 Author Posted December 21, 2019 Thanks @Nine That makes sense too.
Subz Posted December 22, 2019 Posted December 22, 2019 (edited) You could also just use childNodes.item(index) as well, which is useful for getting attributes. #include <IE.au3> $Url = "https://chromedriver.storage.googleapis.com/index.html" $oIE = _IECreate($Url) Sleep(2000) ; So far sleeping 2 seconds here gets results $oRows = _IETagNameGetCollection($oIE, "tr") For $oRow In $oRows If $oRow.childNodes.Length = 5 Then If Int($oRow.childNodes.item(1).innerText) > 2 Then ConsoleWrite(Int($oRow.childNodes.item(1).innerText) & " - " & $oRow.childNodes.item(1).childNodes.item(0).href & @CRLF) EndIf EndIf Next Edited December 23, 2019 by Subz
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now