Jump to content
Sign in to follow this  
MonsieurOUXX

IE automation : _IEBodyReadHTML not returning the full source?

Recommended Posts

MonsieurOUXX

Hi,

the scenario is quite simple :

  • I'm loading a webpage using _IENavigate
  • I know that the right page is loaded because _IEBodyReadText returns a certain string that I'm expecting in the page
  • Problem: _IEBodyReadHTML doesn't return the same source as the source I get in IE by right-clicking on the page and selecting "View Source"

Actually the source returned by IEBodyReadHTML doesn't even contain the string spotted by _IEBodyReadText (even though that string is inside the <body> tags)

I'm very confused. Could it be frames interfering with that?

Share this post


Link to post
Share on other sites
DaleHohm

The IE functions return the page HTML AFTER client-side processing... View Source shows you the source BEFORE client-side processing. Suggest you investigate with DebugBar -- using it's View Source icon allow you to see either and also shows you frames and iFrames.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites
MonsieurOUXX

The IE functions return the page HTML AFTER client-side processing... View Source shows you the source BEFORE client-side processing. Suggest you investigate with DebugBar

So there's no way to simply see the basic code of the page? Why does IE *have* to mess things up???

I'll check that with DebugBar, but if it appears that it's the code BEFORE processing, then I'm f***** am I not?

Share this post


Link to post
Share on other sites
MonsieurOUXX

I have installed DebugBar.

Here are the steps I follow :

  • - I open the webpage
  • - I make sure I have the DebugBar pane open on the left of my browser
  • - I click on the "DOM" tab
  • - I expand "Document"
  • - I expand "HTML"
  • - I click on "BODY". In the lower pane of DebugBar, I can see the code of the body. It starts with:
<BODY onload="LoadDefaults('');">
<FORM id=Form1 name=Form1 action=home2.aspx method=post>
    <DIV>
        <INPUT id=__VIEWSTATE type=hidden value=Vk23RX9e............
        .................
        .................
        .................
        .................+IgmZ76v/6I6 name=__VIEWSTATE>
    </DIV>
    <TABLE id=Table1 cellSpacing=0 cellPadding=0 width="100%" bgColor=#18317b border=0>
        <TBODY>
            .................
            .................
            .................
        </TBODY>
    </TABLE>
    <DIV>
        <INPUT id=__EVENTVALIDATION type=hidden value=ww/UkllZSLM0FiEMMfaU27TV7pDAkZFtPx+MAfDTL0GYxv9u800tsBzkII1XAcOSciEt0dbN6HYPCdL/JB1xVP/NQjuDxiqM name=__EVENTVALIDATION>
    </DIV>
</FORM>
</BODY>

Note about the code :

  • - The "LoadDefaults('');" javascript function simply applies some formatting (a dynamic menu) but is not meant to encrypt or hide any data.
  • - The first "value" attribute (in the <INPUT> tag) is very long (50 lines). Looks like some encrypted stuff, but the data that I want to read on the page comes AFTER that. It's in the second part of the <FORM>, after the "<TBODY>" tag.

However, when I call the function "_IEBodyReadHTML":

  • - it returns ONLY that <FORM> (not the rest of <BODY>)
  • - the form has the same structure but does not contain the same data
(here is the result)

<FORM id=Form1 name=Form1 action=home2.aspx method=post>
<DIV><INPUT id=__VIEWSTATE type=hidden value=Vk23RX9e/d/sObo3c/6iAVHfIa1oHe4kro6yIFO3SwzTRoYIHyBdRLJRdq/OKZ/I3dcX8X+fNDWIzaLD7g8TOkmp5AYkKj/12+T8sGDDTMDimLrHXkLBHTwbp5R3LkYhAbUsj7C/KX8= name=__VIEWSTATE>
</DIV>
<TABLE id=Table1 cellSpacing=0 cellPadding=0 width="100%" bgColor=#18317b border=0>
......
......
</TABLE></DIV>
<DIV><INPUT id=__EVENTVALIDATION type=hidden value=tK4+iT12Tg4DtuwG4dDKqLI5C7FMDoN+piNBihNhYTVvIBDdZTeddGw9OP8KQUftQwnKQM49sfXMBpLHAvE3NEasUate3jyJ name=__EVENTVALIDATION>
</DIV>
</FORM>
<TBODY>

Notes :

  • - as you can see the first <INPUT> tag (encrypted) is much shorter
  • - the <TABLE> contains very little data. All the data that I want has been removed.
So, here is the question: Why does "_IEBodyReadHTML" return only the FORM and not the rest of the BODY? (and, if possible, why does it "decrypt" it?)

[EDITS]Formatting and clarifications

Edited by MonsieurOUXX

Share this post


Link to post
Share on other sites
MonsieurOUXX

OK, so it seems like the code I get is indeed the page *after* the Javascript has been applied.

I have tried the following workarounds :

  • Use INetGetSource => PROBLEM : the page I want to load is protected with a credentials popup window, and INetGet doesn't seem to have enough parameters to avoid waiting for the download to be finished.
  • Automate the option "View Source" in IE. => PROBLEM : this works only if the IE window is visible. If I make it invisible, sending the keys sequence "Alt+V, c" to the $oIE object doesn't create the expected Notepad window with the source.
  • Trying to iterate through the elements of the page (with $oDoc = _IEDocGetObj($o_IE)) but then I'm too bad at using the DOM keywords to get a working program. I'm not even sure that it gives the raw source of the page anyway.

I haven't found a solution. I will leave that problem aside for now.

Share this post


Link to post
Share on other sites
jvanegmond

Looping through all the objects in a page also returns objects created by javascript, so it is after the javascript has been applied.

Share this post


Link to post
Share on other sites
corgano

Sorry to bump an old topic, but I was searching through the forum for help with this EXACT PROBLEM. Has a solution been found? Has any progress been made on this?


0x616e2069646561206973206c696b652061206d616e20776974686f7574206120626f64792c20746f206669676874206f6e6520697320746f206e657665722077696e2e2e2e2e

Share this post


Link to post
Share on other sites
guinness

Well have you at least tried the latest version of AutoIt? Because a lot has changed in the last 2.5yrs and I mean a lot!

You're no stranger here, so when do have a problem it's always best to provide some code that re-produces the problem as well as what version of AutoIt you're using and on what system e.g. Win7.


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
corgano

Sorry, should have known that

Running AutoIt v3.3.8.0 on windows 7, 64 bit. I'm using the _IE commands to make a wrapper our schools "Desire to learn" service, that will make the pager tool easier to use.

$oIE = _IECreate("http://dl.cssd.ab.ca")
$oForm = _IEFormGetObjByName($oIE, "processLogonForm")
setfeild($oForm, "userName", $user)
setfeild($oForm, "password", $pass)
_IEFormSubmit($oForm)

This works as expected, as does then changing to the page I want to get

_IENavigate($oIE,"https://dl.cssd.ab.ca/d2l/tools/pager/pager.asp?ou=41406")
_IELoadWait($oIE)

Then I want to run some regexp lines on the page to get the names and ID's of my contacts. This is where it stops working. I use chrome's inspect element, and I can see:

Derp

Ok,so the data I wanted was in a FRAME on the page, Instead of loading pager.asp I loaded friends.asp and it showed the proper friends list. It works

To anyone else looking for help and seeing this thread, use something like inspect element in chrome to check if what you want is in a FRAME, and then instead navigate to THAT page and try to get the page source. I just happened to miss this yesterday :)


0x616e2069646561206973206c696b652061206d616e20776974686f7574206120626f64792c20746f206669676874206f6e6520697320746f206e657665722077696e2e2e2e2e

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×