How do I capture HTML without loading page?

mootius · January 19, 2012

Good Morning!

I am fairly new to AutoIt and have only written a few scripts. I am trying to figure out if there is a function that will allow me to capture HTML without loading the page in my browser.

E.G. I work in Real Estate and post a lot of ads to Craigslist. I use html code regularly in my ads and wrote a script that will auto-generate my html code for any particular listing. I then adapted the code to build html code for hundreds of listings consecutively. Since my company's intranet database only runs in Chrome and Firefox I opted to use FF.AU3 for most of my Firefox Browser script. Here's how it works:

1. I run a search in my database.

2. As an example, let's say the search returns 100 listings.

3. I then run my program and copy and paste the link for the search results (when I run a search my database saves it as a search number so my link looks something like this: www.realestate.com/search-results/1234). <--- Each search has it's own search number.

4. My program takes all of the search results from the link I provide and builds html code for Craigslist Ads for every listing within that search and saves them as individual text files. But in order to do this it has to load each individual listing in Firefox to pull the html code which brings me to my question:

Is there a way I can just grab the HTML code the server is sending back to me without actually having Firefox load the page?

E.G. Here is a listing link: www.realestate/listing/12345 <-- I just want html code sent back, but don't want to load the page in Firefox.

My script works fine as it is, but I feel like I could really speed things up if I didn't have to wait for page loads for hundreds of listings.

Any help is appreciated. I don't expect anyone to provide me with any code, I'll figure it out myself if need be, but just point me in the right direction please. Maybe a function within AutoIT or some UDF you know of...I already searched the forums, but didn't see anything specific to what I'm asking about...

Thanks!

Edited January 19, 2012 by mootius

bogQ · January 19, 2012

look at Inet* commands in help file, maby InetGet will help you

Edited January 19, 2012 by bogQ

RobGuy · January 19, 2012

As far as scraping multiple pages for things related to Craigslist I've done the sort of thing in GreaseMonkey but the other way 'round (bringing a series of pages of listings together into a single larger page) but I've not explored AutoIt possibilities through Firefox for it. Like bogQ says, maybe something like _INetGetSource? Though if your intranet is using something to limit browsers to Chrome and Firefox through something like testing the User Agent then I suppose it might encounter difficulty since I believe it utilizes Internet Explorer functionality.

AdmiralAlkex · January 19, 2012

@RobGuy

You can change the user-agent for InetRead/Get with HttpSetUserAgent()

Edited January 19, 2012 by AdmiralAlkex

RobGuy · January 19, 2012

Thanks AdmiralAlkex, I've never had to spoof the User Agent in scripting (used Firefox addons many a time for it certainly, but not had to in scripting yet).

mootius, Another consideration that crossed my mind is related to "wait for page loads for hundreds of listings": Are you certain that none of the data is loaded to the results page using Ajax? I had to put together an automation years ago to scrape a site for data where a portion of the page's information was brought in by kicking off an Ajax request when the page was opened so I couldn't download it directly without running it through an object that would perform the Ajax call then wait to collect the document's source until the page was fully loaded (I was specifically asked to implement the scrape in Dot Net and don't remember if I used an embedded browser object or automated an instance of Internet Explorer)

Sign In

How do I capture HTML without loading page?

Recommended Posts

mootius

bogQ

RobGuy

AdmiralAlkex

RobGuy

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta