Jump to content

Scrape a Javascript rendered web page - (Moved)


Recommended Posts

  • Moderators

Moved to the appropriate forum, as the AutoIt Example Scripts forum very clearly states:

Quote

Share your cool AutoIt scripts, UDFs and applications with others.


Do not post general support questions here, instead use the AutoIt Help and Support forums.

Moderation Team

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Hi Skeletor,

I need to extract javascript rendered data of a web page like this: https://www.hpba.org/Membership/Organization-Search/Organization-Profile/orgcd/111773

but trying with _IEDocReadHTML or _INetGetSource functions I only get the original source of the page without data that browser's javascript engine process as receive the html from the webserver. Here are a couple of examples:

1)

#include <IE.au3>
#include <MsgBoxConstants.au3>

Local $oIE = _IECreate("https://www.hpba.org/Membership/Organization-Search/Organization-Profile/orgcd/111773")
Local $sHTML = _IEDocReadHTML($oIE)
ConsoleWrite($sHTML)

_IEQuit($oIE)

2)

#include <Inet.au3>

Local $sURL = "https://www.hpba.org/Membership/Organization-Search/Organization-Profile/orgcd/111773"
Local $sXML = StringReplace(_INetGetSource($sURL), @CRLF, "")
ConsoleWrite($SXML&@CRLF)

Is there a way to get these kind of data with Autoit ?

Thanks in advance,

Enrico

Link to comment
Share on other sites

Try this :

Opt ("MustDeclareVars", 1)
#include <IE.au3>

Local $oIE = _IECreate("https://www.hpba.org/Membership/Organization-Search/Organization-Profile/orgcd/111773")
Local $cFrames = _IEFrameGetCollection ($oIE)
Local $iNumFrames = @extended
Local $sTxt = $iNumFrames & " frames found" & @CRLF & "Error = " & @error & @CRLF
; MsgBox (0,"",$sTxt)
Local $oFrame = 0
$oFrame = _IEFrameGetCollection($oIE, 0)
Local $cTags = _IETagNameGetCollection ($oFrame, "dd")
Local $sInfo = "", $sTag, $iS
For $oTag in $cTags
  $sTag = $oTag.innerText
  $iS = StringInStr ($sTag,"; }",0,-1)
  if $iS then $sTag = StringMid($sTag,$iS+4)
  $sInfo &= $sTag & @CRLF
Next
MsgBox (0,"",$sInfo)

_IEQuit($oIE)

 

Edited by Nine
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...