Sign in to follow this  
Followers 0
mootius

How do I capture HTML without loading page?

5 posts in this topic

#1 ·  Posted (edited)

Good Morning!

I am fairly new to AutoIt and have only written a few scripts. I am trying to figure out if there is a function that will allow me to capture HTML without loading the page in my browser.

E.G. I work in Real Estate and post a lot of ads to Craigslist. I use html code regularly in my ads and wrote a script that will auto-generate my html code for any particular listing. I then adapted the code to build html code for hundreds of listings consecutively. Since my company's intranet database only runs in Chrome and Firefox I opted to use FF.AU3 for most of my Firefox Browser script. Here's how it works:

1. I run a search in my database.

2. As an example, let's say the search returns 100 listings.

3. I then run my program and copy and paste the link for the search results (when I run a search my database saves it as a search number so my link looks something like this: www.realestate.com/search-results/1234). <--- Each search has it's own search number.

4. My program takes all of the search results from the link I provide and builds html code for Craigslist Ads for every listing within that search and saves them as individual text files. But in order to do this it has to load each individual listing in Firefox to pull the html code which brings me to my question:

Is there a way I can just grab the HTML code the server is sending back to me without actually having Firefox load the page?

E.G. Here is a listing link: www.realestate/listing/12345 <-- I just want html code sent back, but don't want to load the page in Firefox.

My script works fine as it is, but I feel like I could really speed things up if I didn't have to wait for page loads for hundreds of listings.

Any help is appreciated. I don't expect anyone to provide me with any code, I'll figure it out myself if need be, but just point me in the right direction please. Maybe a function within AutoIT or some UDF you know of...I already searched the forums, but didn't see anything specific to what I'm asking about...

Thanks!

Edited by mootius

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

look at Inet* commands in help file, maby InetGet will help you

Edited by bogQ

TCP server and client - Learning about TCP servers and clients connection
Au3 oIrrlicht - Irrlicht project
Au3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related)



460px-Thief-4-temp-banner.jpg
There are those that believe that the perfect heist lies in the preparation.
Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost.

 

Share this post


Link to post
Share on other sites

As far as scraping multiple pages for things related to Craigslist I've done the sort of thing in GreaseMonkey but the other way 'round (bringing a series of pages of listings together into a single larger page) but I've not explored AutoIt possibilities through Firefox for it. Like bogQ says, maybe something like _INetGetSource? Though if your intranet is using something to limit browsers to Chrome and Firefox through something like testing the User Agent then I suppose it might encounter difficulty since I believe it utilizes Internet Explorer functionality.

Share this post


Link to post
Share on other sites

Thanks AdmiralAlkex, I've never had to spoof the User Agent in scripting (used Firefox addons many a time for it certainly, but not had to in scripting yet).

mootius, Another consideration that crossed my mind is related to "wait for page loads for hundreds of listings": Are you certain that none of the data is loaded to the results page using Ajax? I had to put together an automation years ago to scrape a site for data where a portion of the page's information was brought in by kicking off an Ajax request when the page was opened so I couldn't download it directly without running it through an object that would perform the Ajax call then wait to collect the document's source until the page was fully loaded (I was specifically asked to implement the scrape in Dot Net and don't remember if I used an embedded browser object or automated an instance of Internet Explorer)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0