saucedog Posted November 4, 2010 Share Posted November 4, 2010 (edited) Hey... I'm trying to learn more about parsing data from websites. I'm basically trying to keep track of multiple statistics that are released on different sites daily. Is AutoIt scripting able to open multiple sites in Firefox (maybe in tabs?) and then push the native "View Page Source" command to somehow spit out the page source into text files? I'll learn how to parse the text files later. Right now I'm trying to find out if this is a feasible project. I don't know if this is possible... or legal... or even the best way to do something like this. I'm just curious at this point. But I have a knack for researching and figuring stuff out on my own So I thought I'd ask. Any help would be greatly appreciated. Edited November 4, 2010 by saucedog Link to comment Share on other sites More sharing options...
Realm Posted November 4, 2010 Share Posted November 4, 2010 Hello saucedog, First, Welcome to the AutoIt Forums There are many ways to do what you ask, however it seems that _InetGetSource() would be best for your needs. _InetGetSource() downloads the source code of the page in need, from there you may manipulate the data with an SRE or _StringBetween() to get the information you need, or you could just save to a text file for later use. What is great about this function, is that it works in the background freeing up your mouse and keyboard for other needs. No Browser is required to use it. Realm My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 4, 2010 Share Posted November 4, 2010 (edited) Downloaded page sources don't necessarily contain the data you want due to the results of scripting actions, contents of frames, etc., not being present in the source. You should be looking into the _IE* functions to work with the DOM's of the actual target pages. Edited November 4, 2010 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now