Sign in to follow this  
Followers 0
saucedog

Parsing web data

3 posts in this topic

#1 ·  Posted (edited)

Hey...

I'm trying to learn more about parsing data from websites. I'm basically trying to keep track of multiple statistics that are released on different sites daily.

Is AutoIt scripting able to open multiple sites in Firefox (maybe in tabs?) and then push the native "View Page Source" command to somehow spit out the page source into text files? I'll learn how to parse the text files later. Right now I'm trying to find out if this is a feasible project.

I don't know if this is possible... or legal... or even the best way to do something like this. I'm just curious at this point. But I have a knack for researching and figuring stuff out on my own :graduated: So I thought I'd ask.

Any help would be greatly appreciated.

Edited by saucedog

Share this post


Link to post
Share on other sites



Hello saucedog,

First, Welcome to the AutoIt Forums :graduated:

There are many ways to do what you ask, however it seems that _InetGetSource() would be best for your needs.

_InetGetSource() downloads the source code of the page in need, from there you may manipulate the data with an SRE or _StringBetween() to get the information you need, or you could just save to a text file for later use.

What is great about this function, is that it works in the background freeing up your mouse and keyboard for other needs. No Browser is required to use it.

Realm


My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Downloaded page sources don't necessarily contain the data you want due to the results of scripting actions, contents of frames, etc., not being present in the source. You should be looking into the _IE* functions to work with the DOM's of the actual target pages.

:graduated:

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0