Jump to content

Parsing web data


 Share

Recommended Posts

Hey...

I'm trying to learn more about parsing data from websites. I'm basically trying to keep track of multiple statistics that are released on different sites daily.

Is AutoIt scripting able to open multiple sites in Firefox (maybe in tabs?) and then push the native "View Page Source" command to somehow spit out the page source into text files? I'll learn how to parse the text files later. Right now I'm trying to find out if this is a feasible project.

I don't know if this is possible... or legal... or even the best way to do something like this. I'm just curious at this point. But I have a knack for researching and figuring stuff out on my own :graduated: So I thought I'd ask.

Any help would be greatly appreciated.

Edited by saucedog
Link to comment
Share on other sites

Hello saucedog,

First, Welcome to the AutoIt Forums :graduated:

There are many ways to do what you ask, however it seems that _InetGetSource() would be best for your needs.

_InetGetSource() downloads the source code of the page in need, from there you may manipulate the data with an SRE or _StringBetween() to get the information you need, or you could just save to a text file for later use.

What is great about this function, is that it works in the background freeing up your mouse and keyboard for other needs. No Browser is required to use it.

Realm

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. 

Link to comment
Share on other sites

Downloaded page sources don't necessarily contain the data you want due to the results of scripting actions, contents of frames, etc., not being present in the source. You should be looking into the _IE* functions to work with the DOM's of the actual target pages.

:graduated:

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...