Sign in to follow this  
Followers 0
phatzilla

I want to create an automation script for searching craigslist ads

52 posts in this topic

#1 ·  Posted (edited)

For example i want to write a script that crawls(Is that the right word?) craigslist in http://toronto.craigslist.org/ele (for example) and every hour or so does a 'scan' of the ads for any ad that contains "TV" in it's string, it then takes every single 'hit' and emails me the ad links every hour or so.

I want to be able to make it work in the background preferably, I just have a huge block and i cant seem to get started....Way too many things conflict in my head. Can anyone lend me some help?

Like maybe for example i have a 'list' of words i want it to search for (I.E "TV" "VCR" "SONY" ETC ) and if any of those words matches an ads string, it will copy/paste that link and email it to me.

Edited by phatzilla

Share this post


Link to post
Share on other sites



SO odd i just found that site today, hope theres a way to do this!


[center][/center]

Share this post


Link to post
Share on other sites

ok so you want some kind of web spider... hmm.... I'll think of something give me an hour [i got ammonia so nothing to do but computer] and i'll see what i can come up with!


[center][/center]

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

For example i want to write a script that crawls(Is that the right word?) craigslist in http://toronto.craigslist.org/ele (for example) and every hour or so does a 'scan' of the ads for any ad that contains "TV" in it's string, it then takes every single 'hit' and emails me the ad links every hour or so.

I want to be able to make it work in the background preferably, I just have a huge block and i cant seem to get started....Way too many things conflict in my head. Can anyone lend me some help?

Like maybe for example i have a 'list' of words i want it to search for (I.E "TV" "VCR" "SONY" ETC ) and if any of those words matches an ads string, it will copy/paste that link and email it to me.

As a start point open AutoIt HelpFile and look at IE UDF:

User Defined Function Reference --> IE Management

_IE_Example()

Edited by Zedna

Share this post


Link to post
Share on other sites

As a start point open AutoIt HelpFile and look at IE UDF:

User Defined Function Reference --> IE Management

_IE_Example()

Thats not as cool as what i want too do, what im going to write will do this in the background every 10 minutes, and point out any number of strings you want, it will parse the first page, therefore it wont go back 5 pages, so if you havent been on in a while you may want to look yourself. Basicly it works like this

the ini file
[prgm]
amt=3
1=VCR
2=TV
3=Computer

results.ini
[Results]
;the prgm will add the data here

when it finds results it will make a traytip that say ' X results found for search criteria X '

Sound like what you want?


[center][/center]

Share this post


Link to post
Share on other sites

ok so you want some kind of web spider... hmm.... I'll think of something give me an hour [i got ammonia so nothing to do but computer] and i'll see what i can come up with!

Yeah dude i think its a nice idea but its so hard for me to get it off the ground. 1 Problem for example is how would i get it to search many different keywords seperatly and when it emails me make sure it doenst email the same link twice etc.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

As a start point open AutoIt HelpFile and look at IE UDF:

User Defined Function Reference --> IE Management

_IE_Example()

Good starting point.... and to add to that - use Google and search for "whatever phrase" site:toronto.craigslist.org/ele. It really helps narrow down what you're looking for.

You can create a list of search phrases and just load em up and start searching. I've never done anything with the IE UDF, so I don't know what's involved in accomplishing your goal, but instead of having it email you - why don't you just copy the link to a file.

Edit: emphasis added.

Edited by Fossil Rock

Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Thats not as cool as what i want too do, what im going to write will do this in the background every 10 minutes, and point out any number of strings you want, it will parse the first page, therefore it wont go back 5 pages, so if you havent been on in a while you may want to look yourself. Basicly it works like this

the ini file
[prgm]
amt=3
1=VCR
2=TV
3=Computer

results.ini
[Results]
;the prgm will add the data here

when it finds results it will make a traytip that say ' X results found for search criteria X '

Sound like what you want?

Ya even the tray tip isnt NESSISAREY but it sounds awesome lol. Because for example if i'm at work it would be cool if my home computer is constantly 'searching' the pages of ads and then emails me links to my predefined keywords every x minutes.

Example:

I star the program, It opens up "toronto.craigslist.org/ele". It proceeds to search for those 3 keywords i mentioned in the original post. Searches the first page, finds 5 links lets say (2 links with "TV", 2 links with "VCR", 1 link with "SONY"). Takes the links, emails me the links. Idles for around 20 minutes, then refreshes the page and does the search OVER AGAIN, and makes sure it doesn't email the same links twice EVER.

Thats the idea

Edited by phatzilla

Share this post


Link to post
Share on other sites

Good starting point.... and to add to that - use Google and search for "whatever phrase" site:toronto.craigslist.org/ele. It really helps narrow down what you're looking for.

You can create a list of search phrases and just load em up and start searching. I've never done anything with the IE UDF, so I don't know what's involved in accomplishing your goal, but instead of having it email you - why don't you just copy the link to a file.

Edit: emphasis added.

I have a feeling i can take care of the emailing portion of the code, it shouldn't be very hard(Maybe im wrong lol, but i've done SMTP stuff before). The actual 'crawling' is hard for me to wrap my head around.

Share this post


Link to post
Share on other sites

Ya even the tray tip isnt NESSISAREY but it sounds awesome lol. Because for example if i'm at work it would be cool if my home computer is constantly 'searching' the pages of ads and then emails me links to my predefined keywords every x minutes.

Example:

Starts searching for those 3 keywords i mentioned in the original post. Searches the first page, finds 5 links lets say. Takes the links, emails me the links. Idles for around 20 minutes, then refreshes the page and does the search OVER AGAIN, and makes sure it doesnt email the same links twice EVER.

Thats the idea

I'll have a finished product by late tomorrow for sure, but hopefully sooner. Depends on if i go to school or not...

I dont wanna get others sick (im not a school skipper, i get straight A's)


[center][/center]

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

I'll have a finished product by late tomorrow for sure, but hopefully sooner. Depends on if i go to school or not...

I dont wanna get others sick (im not a school skipper, i get straight A's)

Haha thanks man i appreciate it.

Would you mind posting your thoughts on how you're going to tackle this briefly? I'd also like to at least give it a shot no matter how much it'll probably end up sucking.

Edited by phatzilla

Share this post


Link to post
Share on other sites

I have a feeling i can take care of the emailing portion of the code, it shouldn't be very hard(Maybe im wrong lol, but i've done SMTP stuff before). The actual 'crawling' is hard for me to wrap my head around.

I love parsing code, and doing data base work, but im a fool with ie automation, so i can get it too parse and build a result file, like a html file w/ the item name and the link...

without adding a link [using a tray tip version ] the basic steps are like so:

;Start </h4>

;Delete All Source before the '</h4>' tag

;Delete All Soucre After the '<p align="center">' tag

;Search '.html">' to '</a>'

;String In Str Test using all terms

;Write Results

;Delete Data Before the next <p align="center">' tag

;Continue loop


[center][/center]

Share this post


Link to post
Share on other sites

I have a feeling i can take care of the emailing portion of the code, it shouldn't be very hard(Maybe im wrong lol, but i've done SMTP stuff before). The actual 'crawling' is hard for me to wrap my head around.

So then post what you have and somebody will help you ...

Note: Don't miss SMTP Email UDF here

Share this post


Link to post
Share on other sites

Haha thanks man i appreciate it.

Would you mind posting your thoughts on how you're going to tackle this briefly? I'd also like to at least give it a shot no matter how much it'll probably end up sucking.

thats what i just did lol

I did something like this that took an online version bill compared it to a library of my contacts and if the number that i called matched a contacts number it put the name in the output file so my bill wasnt a bunch of numbers, but if it wasnt in my contacts list it would go and search white pages or yellow book (you choose)

I hope i can find it, would like to post the source, the final out put file was a table like so

Name (First, Last) | Call Duration | Number | Found With (ex. contacts list, internet)

So im used to this whole pain of parsing....


[center][/center]

Share this post


Link to post
Share on other sites

I love parsing code, and doing data base work, but im a fool with ie automation, so i can get it too parse and build a result file, like a html file w/ the item name and the link...

without adding a link [using a tray tip version ] the basic steps are like so:

;Start </h4>

;Delete All Source before the '</h4>' tag

;Delete All Soucre After the '<p align="center">' tag

;Search '.html">' to '</a>'

;String In Str Test using all terms

;Write Results

;Delete Data Before the next <p align="center">' tag

;Continue loop

Oh wow you can do all of that in AU3? thats cool. What about making sure you dont have the same link twice? So basically what you're saying is you find the links after parsing the Ad page, and you paste the links in a new HTML file? Is that about right?

Share this post


Link to post
Share on other sites

So then post what you have and somebody will help you ...

Note: Don't miss SMTP Email UDF here

Someone write the SMTP script that reads from cmd line parameter and the parameter will have the dir of the file, and that sends it so that i can shell execute it becasue i've never used the SMTP ill try it out later but not for this project....


[center][/center]

Share this post


Link to post
Share on other sites

Oh wow you can do all of that in AU3? thats cool. What about making sure you dont have the same link twice? So basically what you're saying is you find the links after parsing the Ad page, and you paste the links in a new HTML file? Is that about right?

Somewhat its far more complex in the statge where it writes a result this will all be done with StringTRimRight( StringTrimLeft( StringLen( and StringInStr(

How i make that list was i opened the file in front page, and i looked for a tag before the table that was NOT used before, cut all that with string trim and then same to the end. As the prgm progresses it will get faster. To make sure there is no double links i can first wirte it to INI and the info will be the Key Name and the value will just be something null, therefore no entry twice...

Sound good?


[center][/center]

Share this post


Link to post
Share on other sites

Yeah if it works that sounds great lol. I'd also love reviewing your code so i can finally 'get' it :)

Share this post


Link to post
Share on other sites

Yeah if it works that sounds great lol. I'd also love reviewing your code so i can finally 'get' it :)

Hey I'm new to the forum and im looking forward to being a great help ill give you what i've created so far...

#include <String.au3>

$source =;this will be a file read of the downloaded source... i'll incroperate that later
$temp = StringInStr( $Soucre, '</h4>' );finds where the table of results starts
$Source =StringTrimRight( $Source, int($temp+5) );+5 because '</h4>', and this cuts out all data
;before the table
$temp = StringInStr( $source, '<p align="center">' )
$Source = StringTrimLeft( $source, StringLen( $Source-int($temp)-18);this finds the end lenght-pos of end (and 18 for the term characters)
;Now we have just the table were ready to start find etryies with the famous for loop

its a start but dont worry im a quick programmer...


[center][/center]

Share this post


Link to post
Share on other sites

See if this whet's your whistle with IE.au3

Make it work invisibly by setting the visiblilty parameter to _IECreate to FALSE.

Run this to see how easily you can get the link text and href's

#include <IE.au3>
#include <Array.au3>

$oIE = _IECreate("http://toronto.craigslist.org/ele/")

$oPs = _IETagNameGetCollection($oIE, "p")
$cntPs = @extended

Local $aLinkInfo[$cntPs + 1][3]

$aLinkInfo[0][0] = "Index"
$aLinkInfo[0][1] = "Link Text"
$aLinkInfo[0][2] = "href"

$cnt = 1
For $oP in $oPs
    $oLink = _IETagNameGetCollection($oP, "a", 0)
    $aLinkInfo[$cnt][0] = $cnt - 1
    $aLinkInfo[$cnt][1] = $oLink.innerText
    $aLinkInfo[$cnt][2] = $oLink.href
    $cnt +=1
Next

_ArrayDisplay($aLinkInfo, "LinkInfo")

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0