Sign in to follow this  
Followers 0
redrum

No Access to HTML??

9 posts in this topic

Can someone help please?

I have been using AutoIt for several months successfully on several websites. I now have a website I am trying to get data from where I cannot seem to access the HTML. There are no names or controls on the elements I need. I can read out the HTML and see the data I need within it using the DebugBar. (I can zero in on each of the elements on the web page and see the HTML as normal)

When I do a _IEBodyReadHTML and inspect it, it is totally different than what I see with the DebugBar. (I am doing the same _IEBodyWriteHTML on several other websites and it is working fine). There is a frame that has a name, and I can get an object variable to it using _IEGetObjByName and then can get the tagname (Frame) using the oOBJECT.tagname. But if I try to get the HTML using the oOBJECT.innerhtml, or .innertext, I get no text.

I'm to the point on this where I am beginning to wonder whether a website can inhibit access to the HTML, even though it displays using DebugBar. Is this possible, or has anyone else run into this problem?

Thanks

Doug

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

probably dynamic HTML

and

maybe you're accessing wrong object or at a wrong time

maybe there's a timer which loads additional code on some event

post some code and website name

Edited by tobject

Share this post


Link to post
Share on other sites

You are either not getting a reference to the correct frame, or a COM error is being thrown that you do not mention caused by cross-site scripting limitations (add _IEErrorHandlerRegister() to your code to see if you are getting Access Is Denies and make sure to run from SciTe F5).

Also suggest using the View Source icon in DebugBar toolbar to easily see all of the frames and their source.

BTW, dynamic HTML is not the issue since _IEBodyReadHTML reads the final html markup, not the original source as the IE view source menu item does (unless it is a timing issue as tobject suggests).

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Thanks for comments,

As suggested, I am attaching a test script that illustrates the problem.

I have had no success in reading the html from this website, even though DebugBar reads the html out just fine, and it contains

the data elements that I am looking for.

Attached is a test Script that has some code that attempts to read out some html elements. Some are presently commented out

as they result in errors, but are included to illustrate what I have tried that I thought may work.

Any help/suggestions on what the problem is would be greatly appreciated!

Regards,

Doug

NASDAQ html test.au3

Share this post


Link to post
Share on other sites

I think you (often) need to load the actual page the IFrame embeds to be able to fiddle with it's source, or objects.

If you don't need anything from the first page and if the embedded page always has the same adress, you can navigate there directly.

Otherwise you need to get the IFrame's source adress from the main page and then download it, or navigate there.

The example below demonstrates how you can get those values when loading the IFrames page directly.

I'm using INetRead and StringRegExp, but you could do the same with _IE functions. (INetRead is faster though)

The StringRegExp's in the example are pretty crude and at the very least they need errorchecking, but it show how it could work.

Local $sSource, $aTotalShares, $aInstOwnership
;I got this directly from the IFrame's "src=" value
Local $sUrl = "http://holdings.nasdaq.com/asp/Institutional.asp?CIK=&HolderName=&LinesPerPage=5&PageNum=1&SortBy=&Descending=&strFilter=&site=nasdaq&symbol=AKAM&FormType=INSTITUTIONAL&Selected=AKAM&market=NASDAQ-GS&coname=Akamai+Technologies%2C+Inc%2E&LogoPath=http%3A%2F%2Fcontent%2Enasdaq%2Ecom%2Flogos%2FAKAM%2EGIF&pageName="
;read the IFrame's source page: (you could do this with _IENavigate+_IEBodyReadHTML too)
$sSource = InetRead($sUrl)
$sSource = BinaryToString($sSource) ;Convert to string.
$aTotalShares = StringRegExp($sSource,'(?i)(?s)Total Shares Out Standing.*?"Holdnum">.*?(\d+)',3) ;get the value for Shares Out Standing (needs errorchecking)
$aInstOwnership = StringRegExp($sSource,'(?i)(?s)Institutional Ownership.*?"Holdnum">.*?(\d+)',3) ;get the value for Institutional Ownership (needs errorchecking)
ConsoleWrite("Total Shares Out Standing (millions):" & $aTotalShares[0] & @CRLF) ;display results
ConsoleWrite("Institutional Ownership:" & $aInstOwnership[0] & "%" & @CRLF)

Share this post


Link to post
Share on other sites

You seem to be aware of the frame, but then you don't drill into it to try to find what you are after.

Take a look at this code:

#include <IE.au3>
#include <Array.au3>

$oIE = _IECreate ("http://www.nasdaq.com/asp/holdings.asp?symbol=AKAM&selected=AKAM&FormType=Institutional")

$oFRAME = _IEFrameGetObjByName($oIE, "frmMain")
$oTable = _IETableGetCollection($oFrame, 5)
$aTable = _IETableWriteToArray($oTable, True)
_ArrayDisplay($aTable)

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Many thanks for the replys,

I have drilled down into Frames, but what I have used in prior webpages didn't work here. In studying your response code I am finding things, as usual, that I didn't know.

Regards,

Doug

Share this post


Link to post
Share on other sites

probably dynamic HTML

and

maybe you're accessing wrong object or at a wrong time

maybe there's a timer which loads additional code on some event

post some code and website name

I tried to use the "View Source icon" based on your suggestion, "Also suggest using the View Source icon in DebugBar toolbar to easily see all of the frames and their source", but cannot find it.

Is this toolbar only available with the Corporate version?

Also, I really appreciate the code you supplied, I didn't realize how the

_IETableGetCollection and _IETableWriteToArray works. This will really simplify my code from what I have been doing.

Thanks,

Doug

Share this post


Link to post
Share on other sites

No, the View Source icon is in all versions. It is the one to the left of the eyeball icon.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0