redrum 0 Posted August 22, 2010 Can someone help please? I have been using AutoIt for several months successfully on several websites. I now have a website I am trying to get data from where I cannot seem to access the HTML. There are no names or controls on the elements I need. I can read out the HTML and see the data I need within it using the DebugBar. (I can zero in on each of the elements on the web page and see the HTML as normal) When I do a _IEBodyReadHTML and inspect it, it is totally different than what I see with the DebugBar. (I am doing the same _IEBodyWriteHTML on several other websites and it is working fine). There is a frame that has a name, and I can get an object variable to it using _IEGetObjByName and then can get the tagname (Frame) using the oOBJECT.tagname. But if I try to get the HTML using the oOBJECT.innerhtml, or .innertext, I get no text. I'm to the point on this where I am beginning to wonder whether a website can inhibit access to the HTML, even though it displays using DebugBar. Is this possible, or has anyone else run into this problem? Thanks Doug Share this post Link to post Share on other sites
tobject 0 Posted August 22, 2010 (edited) probably dynamic HTML and maybe you're accessing wrong object or at a wrong time maybe there's a timer which loads additional code on some event post some code and website name Edited August 22, 2010 by tobject Share this post Link to post Share on other sites
DaleHohm 65 Posted August 22, 2010 You are either not getting a reference to the correct frame, or a COM error is being thrown that you do not mention caused by cross-site scripting limitations (add _IEErrorHandlerRegister() to your code to see if you are getting Access Is Denies and make sure to run from SciTe F5). Also suggest using the View Source icon in DebugBar toolbar to easily see all of the frames and their source. BTW, dynamic HTML is not the issue since _IEBodyReadHTML reads the final html markup, not the original source as the IE view source menu item does (unless it is a timing issue as tobject suggests). Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curlMSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object modelAutomate input type=file (Related)Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better?IE.au3 issues with Vista - WorkaroundsSciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Share this post Link to post Share on other sites
redrum 0 Posted August 23, 2010 Thanks for comments, As suggested, I am attaching a test script that illustrates the problem. I have had no success in reading the html from this website, even though DebugBar reads the html out just fine, and it contains the data elements that I am looking for. Attached is a test Script that has some code that attempts to read out some html elements. Some are presently commented out as they result in errors, but are included to illustrate what I have tried that I thought may work. Any help/suggestions on what the problem is would be greatly appreciated! Regards, DougNASDAQ html test.au3 Share this post Link to post Share on other sites
Tvern 11 Posted August 23, 2010 I think you (often) need to load the actual page the IFrame embeds to be able to fiddle with it's source, or objects. If you don't need anything from the first page and if the embedded page always has the same adress, you can navigate there directly. Otherwise you need to get the IFrame's source adress from the main page and then download it, or navigate there. The example below demonstrates how you can get those values when loading the IFrames page directly. I'm using INetRead and StringRegExp, but you could do the same with _IE functions. (INetRead is faster though) The StringRegExp's in the example are pretty crude and at the very least they need errorchecking, but it show how it could work. Local $sSource, $aTotalShares, $aInstOwnership ;I got this directly from the IFrame's "src=" value Local $sUrl = "http://holdings.nasdaq.com/asp/Institutional.asp?CIK=&HolderName=&LinesPerPage=5&PageNum=1&SortBy=&Descending=&strFilter=&site=nasdaq&symbol=AKAM&FormType=INSTITUTIONAL&Selected=AKAM&market=NASDAQ-GS&coname=Akamai+Technologies%2C+Inc%2E&LogoPath=http%3A%2F%2Fcontent%2Enasdaq%2Ecom%2Flogos%2FAKAM%2EGIF&pageName=" ;read the IFrame's source page: (you could do this with _IENavigate+_IEBodyReadHTML too) $sSource = InetRead($sUrl) $sSource = BinaryToString($sSource) ;Convert to string. $aTotalShares = StringRegExp($sSource,'(?i)(?s)Total Shares Out Standing.*?"Holdnum">.*?(\d+)',3) ;get the value for Shares Out Standing (needs errorchecking) $aInstOwnership = StringRegExp($sSource,'(?i)(?s)Institutional Ownership.*?"Holdnum">.*?(\d+)',3) ;get the value for Institutional Ownership (needs errorchecking) ConsoleWrite("Total Shares Out Standing (millions):" & $aTotalShares[0] & @CRLF) ;display results ConsoleWrite("Institutional Ownership:" & $aInstOwnership[0] & "%" & @CRLF) Share this post Link to post Share on other sites
DaleHohm 65 Posted August 23, 2010 You seem to be aware of the frame, but then you don't drill into it to try to find what you are after. Take a look at this code: #include <IE.au3> #include <Array.au3> $oIE = _IECreate ("http://www.nasdaq.com/asp/holdings.asp?symbol=AKAM&selected=AKAM&FormType=Institutional") $oFRAME = _IEFrameGetObjByName($oIE, "frmMain") $oTable = _IETableGetCollection($oFrame, 5) $aTable = _IETableWriteToArray($oTable, True) _ArrayDisplay($aTable) Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curlMSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object modelAutomate input type=file (Related)Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better?IE.au3 issues with Vista - WorkaroundsSciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Share this post Link to post Share on other sites
redrum 0 Posted August 23, 2010 Many thanks for the replys, I have drilled down into Frames, but what I have used in prior webpages didn't work here. In studying your response code I am finding things, as usual, that I didn't know. Regards, Doug Share this post Link to post Share on other sites
redrum 0 Posted August 23, 2010 probably dynamic HTML and maybe you're accessing wrong object or at a wrong timemaybe there's a timer which loads additional code on some eventpost some code and website nameI tried to use the "View Source icon" based on your suggestion, "Also suggest using the View Source icon in DebugBar toolbar to easily see all of the frames and their source", but cannot find it.Is this toolbar only available with the Corporate version?Also, I really appreciate the code you supplied, I didn't realize how the _IETableGetCollection and _IETableWriteToArray works. This will really simplify my code from what I have been doing.Thanks,Doug Share this post Link to post Share on other sites
DaleHohm 65 Posted August 23, 2010 No, the View Source icon is in all versions. It is the one to the left of the eyeball icon. Dale Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curlMSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object modelAutomate input type=file (Related)Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better?IE.au3 issues with Vista - WorkaroundsSciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Share this post Link to post Share on other sites