patlim4152 Posted December 11, 2006 Posted December 11, 2006 Hey Guys, I'm having problems when using the incredible ie.au3 library. When using the _IEBodyReadHTML function, it returns a different text from when I manually view source using IE. I know it is looking at the right window because that's the only window I have up and the text is similiar, just certain parts are coded out. Was wondering if anyone has encountered this problem before. I'm not too sure if it's an encryption method the website is using though.
mikehunt114 Posted December 11, 2006 Posted December 11, 2006 Check out the differences between Body and Doc. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font]
Moderators big_daddy Posted December 11, 2006 Moderators Posted December 11, 2006 Dale can explain this better, but I will attempt to myself. When using _IEBodyReadHTML() you get the "generated source", which can differ from the real source do to javascript and such. When working with the IE.au3 Library you will want to go with the "generated source" for referencing the DOM.
DaleHohm Posted December 11, 2006 Posted December 11, 2006 (edited) In addition to what Mike said... what you see with the View Source command in IE is a snapshot of the page that was loaded - it is not necessarily what is loaded when the page comes to rest and if scripted changes are made to the page (either by something external like AutoIt or internal like Javascript) they are not updated for View Source. This is one of the really nice features of the _IE routines that they allow you to see and operate on the dynamic HTML. Dale Edit: and if you put all three of our answers together, you get one really good one! Pretty cool to see competition to answer IE.au3 questions... Edited December 11, 2006 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
mikehunt114 Posted December 11, 2006 Posted December 11, 2006 Both you guys are way better at explaining these things. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font]
patlim4152 Posted December 11, 2006 Author Posted December 11, 2006 Wow, Gee. Thanks guys, those were fast replies... Thus, to get the 'real source', I would need to use ____ ?
mikehunt114 Posted December 11, 2006 Posted December 11, 2006 Wow, Gee. Thanks guys, those were fast replies...Thus, to get the 'real source', I would need to use ____ ?I believe the most complete source comes from _IEDocReadHTML. _IEBodyReadHTML doesn't return anything outside the <body> tags. _IEDocReadHTML will return the full source, and in some cases be more complete than using View Source. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font]
patlim4152 Posted December 11, 2006 Author Posted December 11, 2006 Oh... bah, I need to update, was using a pretty old version of ie.au3 (Served me well!). It didn't have IEDocReadHTML. Thanks guys!
The Kandie Man Posted December 11, 2006 Posted December 11, 2006 Wow, Gee. Thanks guys, those were fast replies...Thus, to get the 'real source', I would need to use ____ ?I believe _INetGetSource ( $s_URL ) would get the 'real source'. "So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire
patlim4152 Posted December 11, 2006 Author Posted December 11, 2006 Hmm... Just tried but didn't work, _IEDocReadHTML didn't give me what I wanted. I think the website is using some script to change the code after it's loaded, I still see everything I want on view source. I'll try _INetGetSource ( $s_URL ) next. Gotta find the library first.
patlim4152 Posted December 11, 2006 Author Posted December 11, 2006 Hmm.... Interesting... INetGetSource works! But I can't exactly navigate from page to page based on the result I get from there. It's late here, will figure something out tomorrow. Thanks for the help guys.
The Kandie Man Posted December 12, 2006 Posted December 12, 2006 Hmm.... Interesting... INetGetSource works! But I can't exactly navigate from page to page based on the result I get from there. It's late here, will figure something out tomorrow. Thanks for the help guys.Ok, glad i could help. Keep us updated on your progress. We are always glad to help. "So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire
DaleHohm Posted December 12, 2006 Posted December 12, 2006 (edited) I don't know what is meant by "real" source. _IEDocReadHTML will give you the rendered HTML source including the <HEAD> section and scripts. _IEBodyReadHTML returns the rendered HTML inside the <BODY></BODY> tags. _INetGetSource will return the original, unrendered HTML source of the page. Let me demonstrate. Start with this html file, tmp.htm on a server: <html><body> This text will change: <div id="foo">ORIGINAL TEXT</div> <script language="javascript"> document.getElementById ("foo").innerHTML = "THIS IS DYNAMIC TEXT"; </script> </body></html> As you will see, it will initially display the words "ORIGINAL TEXT", but when the page is displayed, the Javascript immediately changes this to "THIS IS DYNAMIC TEXT". We'll use this script that shows the output of _INetGetSource and the _IE functions: #include <inet.au3> ConsoleWrite("***** _INetGetSource *****" & @CR & _INetGetSource("http://localhost/tmp.htm") & @CR & @CR) #include <IE.au3> $oIE = _IECreate("http://localhost/tmp.htm") ConsoleWrite("***** _IEBodyReadText *****" & @CR & _IEBodyReadText($oIE) & @CR & @CR) ConsoleWrite("***** _IEDocReadHTML *****" & @CR & _IEDocReadHTML($oIE) & @CR & @CR) ConsoleWrite("***** _IEBodyReadHTML *****" & @CR & _IEBodyReadHTML($oIE) & @CR & @CR) You'll see that _INetGetSource displays the HTML just as it is stored on the server - this is what you see with the browser's View Source command. [EDIT: to be precise, it does not necessarily display the file as stored on the server, but rather the output after any server-side processing is performed - so if it an ASP file or a cgi script, you get the HTML output generated from the server processing of those files.] All of the _IE* commands display the rendered document AFTER it has been updated by the Javascript. One isn't right and the other wrong, it is just different and used for different purposes. ***** _INetGetSource ***** <html> <body> This text will change: <div id="foo">ORIGINAL TEXT</div> <script language="javascript"> document.getElementById ("foo").innerHTML = "THIS IS DYNAMIC TEXT"; </script> </body> </html> ***** _IEBodyReadText ***** This text will change: THIS IS DYNAMIC TEXT ***** _IEDocReadHTML ***** <HTML><HEAD></HEAD> <BODY>This text will change: <DIV id=foo>THIS IS DYNAMIC TEXT</DIV> <script language=javascript> document.getElementById ("foo").innerHTML = "THIS IS DYNAMIC TEXT"; </SCRIPT> </BODY></HTML> ***** _IEBodyReadHTML ***** This text will change: <DIV id=foo>THIS IS DYNAMIC TEXT</DIV> <script language=javascript> document.getElementById ("foo").innerHTML = "THIS IS DYNAMIC TEXT"; </SCRIPT> Dale Edited December 12, 2006 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
mikehunt114 Posted December 12, 2006 Posted December 12, 2006 Very thorough Dale....I learned something new today. IE Dev ToolbarMSDN: InternetExplorer ObjectMSDN: HTML/DHTML Reference Guide[quote]It is surprising what a man can do when he has to, and how little most men will do when they don't have to. - Walter Linn[/quote]--------------------[font="Franklin Gothic Medium"]Post a reproducer with less than 100 lines of code.[/font]
patlim4152 Posted December 16, 2006 Author Posted December 16, 2006 Alright guys. Thanks for the explanations so far. But after a busy week I'm back at it, and I should have explained myself better. What Dale said about not being a 'right' and 'wrong' version makes sense. What I want is actually the _INetGetSource version (I basically want the same text as view source), not the _IEReadHTML versions. But I believe that using _IE to open the page, then _INetGetSource to retrieve the unedited source would mean two separate calls to the server, I was wondering if there was anyway around this? An alternative I was thinking of is to retrieve the view source directly from IE. This can 'easily' be done by just manually writing a script which just views the source, and then read the code from the new window, then closing that window. I'm personally trying to avoid this method. So if anyone knows any better way around this, I'd be deeply appreciated. Thanks.
DaleHohm Posted December 16, 2006 Posted December 16, 2006 (edited) I don't know of any way to do this other than the methods already mentioned. Dale Edit: typo Edited December 16, 2006 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble
patlim4152 Posted December 17, 2006 Author Posted December 17, 2006 (edited) HAH! Thanks for the replies guys, thought I should post my solution! I was thinking about it and was wondering, what if I "caught" the code before they made any changes. So what I did was to do a _IENavigate ($o_IE, "http://www.somewhere.com",0) While($o_IE.readyState <> 3) sleep(100) WEnd $text = _IEDocReadHTML($o_IE) Edited December 17, 2006 by patlim4152
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now