Jump to content

Scrape Javascript Rendered Directory


 Share

Recommended Posts

Hi, 

Working on getting the latest chrome driver  from https://chromedriver.storage.googleapis.com/index.html but the page is pure javascript that gets generated after the user visits it. Standard scraping isn't working. For example:

#include <IE.au3>

$Url = "https://chromedriver.storage.googleapis.com/index.html"
$oIE = _IECreate($Url)
$oRows = _IETagnameGetCollection($oIE, "tr") 
ConsoleWrite("oRows:" & $oRows.Length & @CRLF)

Results in the following output:

oRows:0
>Exit code: 0    Time: 0.5149

 

Thanks for any ideas and help 🙂

 

 

Edited by NassauSky
Added resulting output
Link to comment
Share on other sites

So far this. Seems maybe I had to add a sleep before scraping but this code works slow as molasses

 

#include <IE.au3>

$Url = "https://chromedriver.storage.googleapis.com/index.html"
$oIE = _IECreate($Url)
Sleep(2000) ; So far sleeping 2 seconds here gets results
$oRows = _IETagnameGetCollection($oIE, "tr")
ConsoleWrite("oRows:" & $oRows.Length & @CRLF)
For $oRow in $oRows
    $innerString = ''
    If StringLen($oRow.innerText) > 0 And $oRow.innerText <> "-" Then
      $aChromeVer = StringSplit($oRow.innerText,".")
      If StringIsDigit ( $aChromeVer[1] ) And Number($aChromeVer[1]) > 2 Then ;
         ConsoleWrite("->" & $aChromeVer[1] & @CRLF)
      EndIf
      $oDatas = _IETagnameGetCollection($oRow, "td") ;<--this slowed up the process looking for the subset of data
    EndIf
 Next
_IEQuit($oIE)

Takes so much longer than loading the page to extract some data. Is there a way to speed this up?

 

EDIT UPDATE: I found out that the next step $oData = _IETagnameGetCollection was slowing down the scraping.

Edited by NassauSky
Removed code not yet necessary to operate
Link to comment
Share on other sites

@NassauSky or you could have used table functions like this :

#include <IE.au3>
#include <Array.au3>

$Url = "https://chromedriver.storage.googleapis.com/index.html"
$oIE = _IECreate($Url)
Sleep(1000) ; So far sleeping 1 second here gets results

$oTable = _IETableGetCollection($oIE,0)
$aTable = _IETableWriteToArray($oTable, True)
_ArrayDisplay ($aTable)
_IEQuit($oIE)

Instant fast :)

Link to comment
Share on other sites

You could also just use childNodes.item(index) as well, which is useful for getting attributes.

#include <IE.au3>

$Url = "https://chromedriver.storage.googleapis.com/index.html"
$oIE = _IECreate($Url)
Sleep(2000) ; So far sleeping 2 seconds here gets results
$oRows = _IETagNameGetCollection($oIE, "tr")
For $oRow In $oRows
    If $oRow.childNodes.Length = 5 Then
        If Int($oRow.childNodes.item(1).innerText) > 2 Then
            ConsoleWrite(Int($oRow.childNodes.item(1).innerText) & " - " & $oRow.childNodes.item(1).childNodes.item(0).href & @CRLF)
        EndIf
    EndIf
Next

 

Edited by Subz
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...