Sign in to follow this  
Followers 0
Floppy

Get web page source, including javascript generated output

4 posts in this topic

Hello, I'm trying to create a Google Keyword Tool scraper with AutoIt. I'm using the following code:

#include <IE.au3>
$oIE = _IECreate ("https://adwords.google.com/o/KeywordTool")
sleep(20000)
$source = _IEDocReadHTML ($oIE)

MsgBox(0,'',$source)

(The sleep is there to give me the time to type the query and click Search in the IE window - in the future I'll automate this)

The HTML source it outputs doesn't contain the table with the results, altough I can see it in Firebug. Below there's a sigle row I've extracted with Firebug.

<tr __gwt_row="19" __gwt_subrow="0" class="sCT"><td class="sBS sDT sES" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1059"><div id="gwt-debug-column-SELECTION-row-19-0"><input type="checkbox" class="sML"></div></div></td><td class="sBS sDT" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1060"><div id="gwt-debug-column-KEYWORD-row-19-1"><span style="white-space:nowrap"><span></span><span><a class="sOL" gwtuirendered="gwt-uid-1089"><b>windows</b> live</a></span><span></span></span></div></div></td><td class="sBS sDT" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1062"><div id="gwt-debug-column-COMPETITION-row-19-2"><div title="0,04">Bassa</div></div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1063"><div id="gwt-debug-column-GLOBAL_MONTHLY_SEARCHES-row-19-3">20.400.000</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1064"><div id="gwt-debug-column-AVERAGE_TARGETED_MONTHLY_SEARCHES-row-19-4">20.400.000</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1065"><div id="gwt-debug-column-SUGGESTED_BID-row-19-5">€&nbsp;0,40</div></div></td><td class="sBS sDT aw-ti-advertiser-specific-cell" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1066"><div id="gwt-debug-column-AD_SHARE-row-19-6">-</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1067"><div id="gwt-debug-column-AVERAGE_MONTHLY_SEARCHES_WITH_AFS-row-19-7">-</div></div></td><td class="sBS sDT aw-ti-advertiser-specific-cell" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1068"><div id="gwt-debug-column-SEARCH_SHARE-row-19-8">-</div></div></td><td class="sBS sDT" align="right"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1069"><div id="gwt-debug-column-TARGETED_MONTHLY_SEARCHES-row-19-9"><div style="width: 108px; white-space: nowrap" dir="ltr"><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 16px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 13px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div><div style="width: 1px;" class="goog-inline-block"></div><div style="width: 8px;height: 10px; background-color: #A4D0BB; vertical-align:bottom;" class="goog-inline-block" title=""></div></div></div></div></td><td class="sBS sDT sOS" align="left"><div style="outline-style:none;" __gwt_cell="cell-gwt-uid-1070"><div id="gwt-debug-column-EXTRACTED_FROM_WEBPAGE-row-19-10">-</div></div></td></tr>

Is there a way to get the full source with Autoit, including the content generated with javascript?

Thank you

Share this post


Link to post
Share on other sites



Is there a way to get the full source with Autoit, including the content generated with javascript?

Thank you

Hi,

It's not possible to execute the javascript code, so you need to let the browser do it; wait for the page to fully load and then get the current body source.

Br, FireFox.


 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

Hi,

It's not possible to execute the javascript code, so you need to let the browser do it; wait for the page to fully load and then get the current body source.

Br, FireFox.

The script in the first post load the page and then get the source. But the source doesn't contain the content generated by the javascript code

Share this post


Link to post
Share on other sites

The script in the first post load the page and then get the source. But the source doesn't contain the content generated by the javascript code

I don't know much of the IE UDF, take a look at the FF UDF (for Firefox), I'm sure this one can do what you are looking for.

Br, FireFox.


 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0