Sign in to follow this  
Followers 0
bundyxc

Viewing source code

14 posts in this topic

Every time I attempt to view source code of a remote webpage using _IEDocReadHTML() or INetGetSource(), it prompts me to download a file. Any way of getting around this? I just need to grab a line of the source code of the page that I'm on.


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites



Just realized the problem: the file that I'm trying to read is a .cfm file, and ergo the computer doesn't know what to do with it, and AutoIt doesn't know how to handle it. Any other way to view the source, as opposed to it downloading the .cfm file (which is pretty much the same as an html file).


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

It is possible that the webpage you are trying to download, asks the client to download an additional file. The libraries downloading the page source are also responsible for handling any other requests. I don't know enough about these libraries, but I know that you don't have to use them should the need arise.

Dealing with HTTP, I have personally always preferred, like some people prefer to eat their spaghetti with a spoon, to write my own implementation of the HTTP protocol, following w3 defined standards, of course.

Up until this point, I have found a good starting point to be looking at how the existing libraries or tools do the same. A packet scanner is a very useful tool in analyzing this network traffic, and for this I suggest to Wireshark. Once you have captured said HTTP packets, it is simple, in AutoIt, to reproduce these.

Should you have any questions, perhaps arbitrary, perhaps invalid, then do not hesitate to ask. After all, we are here to help.

Share this post


Link to post
Share on other sites

So, as I'm understanding it, you're suggesting that I capture HTTP packets, and piece them together client-side, rather than try to download the page as a whole? The page runs just like any other html/php/etc page, it just has a .cfm extension. Is there maybe any way to just rename the file, or something? I'm not sure how to get started with picking up HTTP packets.


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

So, as I'm understanding it, you're suggesting that I capture HTTP packets, and piece them together client-side, rather than try to download the page as a whole? The page runs just like any other html/php/etc page, it just has a .cfm extension. Is there maybe any way to just rename the file, or something? I'm not sure how to get started with picking up HTTP packets.

It's not an HTML element. It's a binary Flash file passed to Adobe Flash to be played. Windows, the browser, and AutoIt have no native way to interpret the file without calling Adobe Flash.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

I think I might have miscommunicated something. As far as I'm aware, this isn't Flash. It's a .cfm, which (to my understanding), is ColdFusion. It has HTML source, just like any other page, and I can do everything on the site with Flash turned off.

Here's a sample of the source:

<div id="datetoday" class="clearfix">
    <div id="lyrTime2">Thursday, October 29, 2009  2:08 PM</div>

    <div style="clear:both;padding:0;"></div>
</div>

        <div id="myMenu" class="close">
 </div>
      </div>
      <div id="mainContent" class="clearfix">

        <div id="col1">

          

<div id="userdisplay" class="module">
    <div class="top">
        <div>
            <div>&nbsp;</div>
        </div>
    </div>

Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

I think I might have miscommunicated something. As far as I'm aware, this isn't Flash. It's a .cfm, which (to my understanding), is ColdFusion. It has HTML source, just like any other page, and I can do everything on the site with Flash turned off.

Here's a sample of the source:

<div id="datetoday" class="clearfix">
    <div id="lyrTime2">Thursday, October 29, 2009  2:08 PM</div>

    <div style="clear:both;padding:0;"></div>
</div>

        <div id="myMenu" class="close">
 </div>
      </div>
      <div id="mainContent" class="clearfix">

        <div id="col1">

          

<div id="userdisplay" class="module">
    <div class="top">
        <div>
            <div>&nbsp;</div>
        </div>
    </div>
It's HTML-like, but not HTML. I'm pretty sure the ColdFusion Markup Language (.cfm) is only interpreted server-side by ColdFusion, and in the case of our servers that's primarily to select and present PDF documents and Flash stuff (CBTs and the like). ColdFusion is only installed on the server, not the client, so you can't expect to parse it's stuff locally on your client (unless you installed ColdFusion or and Adobe plugin).

I could still be wrong (I'm not the ColdFusion guy), but that's the way it looks from here.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Try using DebugBar to see what the browser sees as the local dynamic source.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Try using DebugBar to see what the browser sees as the local dynamic source.

Dale

Hmm... there's more HTML there than I thought. I had looked into using IE.au3 to automate some CF MX install stuff back when I first started learning it, and IE's View Source showed me a blank page with just "index.cfm" on it. I'm looking at the CF admin pages: ".../CFIDE/Administrator/index.cfm" and ".../CFIDE/Administrator/DataSources/index.cfm". DebugBar does show a full HTML DOM from the client browser, albeit with LOTS of Frames embedded in it.

IE itself doesn't show source for any of it with its own tools.

Guess I was just fooled.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

So, it sounds like you're saying that the HTML is there.. but AutoIt isn't finding it? Is there a workaround?


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

So, it sounds like you're saying that the HTML is there.. but AutoIt isn't finding it? Is there a workaround?

I think what Dale is talking about is HTML is created client side. Scripts may run client side that create more HTML, you don't see this going view->source as this shows the server side HTML.

You will have to wait will the page has loaded before you look at the HTML through the DebugBar. From there you can "drill down" to the sections of interest.


Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic]

Share this post


Link to post
Share on other sites

But things like divs aren't going away. There ARE static portions of the page, and those are the ones I'm trying to target. The page IS fully loaded, and there are NO changes whatsoever. I'm attempting to read the STATIC HTML, that's it. I'm asking if there's a way to do that, or if the UDF doesn't support it.


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

But things like divs aren't going away. There ARE static portions of the page, and those are the ones I'm trying to target. The page IS fully loaded, and there are NO changes whatsoever. I'm attempting to read the STATIC HTML, that's it. I'm asking if there's a way to do that, or if the UDF doesn't support it.

No, no, they are saying that some of the HTML could be entering client-side into the page before the page is completely loaded.

Do you have a link maybe so we can see for ourselves? That'd help a lot and prevent unnecessary discussion. Maybe you can also write a reproducer, or similar.

Share this post


Link to post
Share on other sites

Sure, I mean.. it's just a MySpace page.

Try going to

http://home.myspace.com/index.cfm?fuseaction=user

Email: myspaceee@slopsbox.com

Password: AutoIt!

It should bring up the User CP of that MySpace account. If you try to load up the source in AutoIt, it comes up with a download prompt for me.

"View Source" in IE8/FF3/etc works, but it tries to download the .cfm file, and then asks what kind of program should be used to open it.


Global $arr[2]

$arr[0]="hip"
$arr[1]="hip"
;^^ hip hip array. ^^

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0