iloveyou

Learning about web scraping

12 posts in this topic

I want to learn web scraping. Can you guys link me some sample scripts or any resource you think might be useful? 

Also I discovered the IE.au3 UDFs they seem great for web scraping but it seems to be only for internet explorer? Is there any similar UDFs for other browsers.

Thank you. 

 

Share this post


Link to post
Share on other sites



Have you tried searching the forums and google? Something tells me you haven't. Most people aren't going to work for free if you aren't going to put in at least a little bit of effort.

1 person likes this

False Positive Reporter - Mass email all anti virus vendors with an attachment of your program for fast and easy whitelisting.

PortableApps.com App Creation Wizard  - A simple GUI-based Wizard for creating PortableApps.

SoundBoard - Play any song or sound you want at the press of a hotkey.

My GitHub Page: https://github.com/BetaLeaf

Share this post


Link to post
Share on other sites
17 hours ago, BetaLeaf said:

Have you tried searching the forums and google? Something tells me you haven't. Most people aren't going to work for free if you aren't going to put in at least a little bit of effort.

Well you could always share some information you have at the top of your head. 

Share this post


Link to post
Share on other sites
1 hour ago, iloveyou said:

Well you could always share some information you have at the top of your head. 

This information is already at the top of the search results for both AutoIt search and google search. We want to pass on the idea that programmers should teach themselves, not rely on others for help. I do most of my own programming, but there are occasions I do need help. At those times, please, come ask the community. We just ask you show what you have already tried so we can narrow down and help you find a solution. Most of us here won't help you if you will not first help yourself, and when we see others just asking for information handouts without doing any searching yourself, we just shake our heads and skip over your post.

 

To quote @BrewManNH's signature:  Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude

Sure we could give you the code today, but tomorrow you will be back for more code. It's just better to rely on yourself and the resources you have available, such as Google, AutoIt forums, and even the AutoIt Documentation.


False Positive Reporter - Mass email all anti virus vendors with an attachment of your program for fast and easy whitelisting.

PortableApps.com App Creation Wizard  - A simple GUI-based Wizard for creating PortableApps.

SoundBoard - Play any song or sound you want at the press of a hotkey.

My GitHub Page: https://github.com/BetaLeaf

Share this post


Link to post
Share on other sites
2 hours ago, BetaLeaf said:

This information is already at the top of the search results for both AutoIt search and google search. We want to pass on the idea that programmers should teach themselves, not rely on others for help. I do most of my own programming, but there are occasions I do need help. At those times, please, come ask the community. We just ask you show what you have already tried so we can narrow down and help you find a solution. Most of us here won't help you if you will not first help yourself, and when we see others just asking for information handouts without doing any searching yourself, we just shake our heads and skip over your post.

 

To quote @BrewManNH's signature:  Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude

Sure we could give you the code today, but tomorrow you will be back for more code. It's just better to rely on yourself and the resources you have available, such as Google, AutoIt forums, and even the AutoIt Documentation.

Maybe I am just incompetent at googling and searching this forum because I cant find a scraper. I have spent a good couple of hours on this and so far I have learnt to use _IEGetObjById plus the innerText method to get what I want. Now I am stuck, I assumed logically that it would return a string but the String functions don't work.

I don't know what to do at this stage, might as well go back to ruby and make a scraper there.  

Share this post


Link to post
Share on other sites

Post the code you have started - it makes it much easier to help if we can see what you're doing.

Share this post


Link to post
Share on other sites
18 hours ago, MuffinMan said:

Post the code you have started - it makes it much easier to help if we can see what you're doing.

Local $oIE = _IECreate("http://www.morningstar.com.au/Stocks/UpcomingDividends",0,0)
; _IEImgClick($oIE,'/Content/scripts/tablesorter/addons/pager/icons/next.png')
dim $aTable = _IEGetObjById($oIE,"OverviewTable")
global $text = $aTable.innerText
dim $aText = StringSplit($text,' ')
ConsoleWrite($aText&@CRLF)

Nothing comes up in console. 

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

StringSplit returns a array, so use this:

For $i= 1 to  $aText[0]
    ConsoleWrite($aText[$i]&@CRLF)
Next

to output the data.

Edited by AutoBert

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

For starters, $aText is an array, so you are not going to see anything.

Try $aText[1] for first element or use _ArrayDisplay to see them all.

$aText[0] is the total count of elements.

EDIT

Or use what AutoBert (who just beat me :P) provided.

Edited by TheSaint

AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites

Thank you guys for your help and patience but I decided to code the scraper in ruby since there appears to be more functionality. 

Share this post


Link to post
Share on other sites

Since you are going to be using the DOM for ruby or autoit, there is no difference in functionality.

1 person likes this

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now