Jump to content



Photo

So. I want to save pages that have blablabla in them instead of just every page, Possible with autoit?


  • Please log in to reply
9 replies to this topic

#1 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 13 April 2012 - 09:30 AM

I have wrote tools for pulling pages off of a range and dumping their source to a textfile.

Example:

   For $i = $Start To $Finish $url = "<a href='http://DOMAIN.com/Pageid=' class='bbc_url' title='External link' rel='nofollow external'>http://DOMAIN.com/Pageid="</a> & $i     $source = _INetGetSource($url)     FileWrite("FILE.txt", $i & @CRLF)     FileWrite("FILE.txt", $source & @CRLF)


However, Now I want to make this into a tool that instead of just pulling the page source entirely, I want it to save ONLY the URL if a certain line of text is in the source of the page.
How it is setup now, It just dumps the entire source of each page into a single textfile so I can just use notepads find function to find the pieces of text I want.

I suppose it could be considered a type of crawler. But instead of just one thing being searched for...I would like it to search for say. Several phrases or lines. And only if that line of text exists in the sourcefile, I would like it to write the page ID number ($i) To the list.

So...Can anybody help me with building something like this?.. It would help me out a lot.
Sorry for the complicated explanation. but, I consider this complicated ><







#2 hannes08

hannes08

    my oh my

  • Active Members
  • PipPipPipPipPipPip
  • 930 posts

Posted 13 April 2012 - 10:19 AM

Hello SupGuvna,

you can use StringInStr() function or similar functions to check whether the string you're searching for is in the source.
  • SupGuvna likes this
Regards,Hannes
Spoiler

#3 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 13 April 2012 - 09:18 PM

Hello SupGuvna,

you can use StringInStr() function or similar functions to check whether the string you're searching for is in the source.

experimented around with it quite abit...all I can seem to do is get it to dump the url to textfile along with "10"

Any ideas?...

$source = _INetGetSource($url) StringInStr($source, "HerroThere", 0, 1,0,0) Local $result = StringInStr("I am a String", "RING") FileWrite("test.txt", $result & @CRLF)


Not sure if this is being properly used or what I am doing wrong. Not exactly an expert when it comes to this ><

#4 EndFunc

EndFunc

    Universalist

  • Active Members
  • PipPipPipPipPipPip
  • 434 posts

Posted 13 April 2012 - 09:32 PM

experimented around with it quite abit...all I can seem to do is get it to dump the url to textfile along with "10"

Any ideas?...


$source = _INetGetSource($url) StringInStr($source, "HerroThere", 0, 1,0,0) Local $result = StringInStr("I am a String", "RING") FileWrite("test.txt", $result & @CRLF)
</pre><br />Not sure if this is being properly used or what I am doing wrong. Not exactly an expert when it comes to this ><<br /></p></blockquote><br />Try this<br /><br /><pre class="prettyprint lang-nocode">$source = _INetGetSource($url) $Str = StringInStr($source, "HerroThere") Local $result = StringMid($source, $Str) FileWrite("test.txt", $result & @CRLF)

Edited by EndFunc, 13 April 2012 - 09:32 PM.

  • SupGuvna likes this
EndFuncAutoIt is the shiznit. I love it.

#5 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 14 April 2012 - 05:58 AM

Try this

$source = _INetGetSource($url) $Str = StringInStr($source, "HerroThere") Local $result = StringMid($source, $Str) FileWrite("test.txt", $result & @CRLF)

This works, But I was hoping it would write the var used instead of the results themselfs.

Such as...it finds the line HerroThere in page 5784
Instead of writing the results, I want to make it write the page it was found in <3

Understand? Though, This is definitely a big step in the right direction.

#6 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 14 April 2012 - 06:30 AM

Here is the closest I can get...


8336 8337 HerroThere (Followed by the rest of the page source for some reason) 8338


Though thats by going with this route.

FileWrite("test.txt", $i & @CRLF) FileWrite("test.txt", $result & @CRLF)


Is it possible at all to write NOTHING to the text file with the exception of page ID`s via $i that have the string HerroThere in them?
Sorry for making things complicated x-x

#7 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 14 April 2012 - 10:50 AM

This site could use an edit button..But anyways, I have gotten a step closer!

For $i = $Start To $Finish     $url = "<a href='http://www.Domain.com/pageid/' class='bbc_url' title='External link' rel='nofollow external'>http://www.Domain.com/pageid/"</a> & $i    $source = _INetGetSource($url)   $Str = StringInStr($source, "HerroThere",0) $Main = ($i & " " & $Str) FileWrite("test.txt",$Main & @CRLF)


Now the output is down to this!

8336 0
8337 1787
8338 0
8339 0
8340 0

Anybody got a way to push to the final step? <3 Almost there!

#8 Bowmore

Bowmore

    Feinschmecker

  • Active Members
  • PipPipPipPipPipPip
  • 783 posts

Posted 14 April 2012 - 11:30 AM

This should show you how you might achieve what you want

Local $sUrl = "<a href='http://www.Domain.com/pageid/' class='bbc_url' title='External link' rel='nofollow external'>http://www.Domain.com/pageid/"</a> Local $sFind = "HerroThere" Local $sSource = "" Local $iFirstPage = 1 Local $iLastPage = 20 For $i = $iFirstPage To $iLastPage   $sSource = _INetGetSource($sUrl & $i)   if StringInStr($sSource, $sFind,0) Then     FileWriteLine("test.txt","Found " & $sFind & " on page " & $i & " of " & $sUrl)   endif Next

Edited by Bowmore, 14 April 2012 - 11:31 AM.

  • SupGuvna likes this
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

#9 Melba23

Melba23

    Yes, me!

  • Moderators
  • 15,761 posts

Posted 14 April 2012 - 11:39 AM

SupGuvna,

This site could use an edit button

Now you have 5 posts you should see one at bottom right. ;)

M23
  • SupGuvna likes this
StringSize - Automatically size controls to fit text                                                               ExtMsgBox - A user customisable replacement for MsgBox
Toast - Small GUIs which pop out of the Systray                                                                Marquee - Scrolling tickertape GUIs
Scrollbars - Automatically sized scrollbars with a single command                                   GUIFrame - Subdivide GUIs into many adjustable frames
GUIExtender - Extend and retract multiple sections within a GUI                                      NoFocusLines - Remove the dotted focus lines from buttons, sliders, radios and checkboxes
ChooseFileFolder - Single and multiple selections from specified path tree structure      Notify - Small notifications on the edge of the display
RecFileListToArray- An alternative to _FileListToArray with user-defined include/exclude masks, maximum recursion level, sorting and displayed path options
GUIListViewEx - Insert, delete, move, drag and sort ListView items

#10 SupGuvna

SupGuvna

    Seeker

  • Active Members
  • 6 posts

Posted 14 April 2012 - 08:17 PM

This should show you how you might achieve what you want

Local $sUrl = "<a href='http://www.Domain.com/pageid/' class='bbc_url' title='External link' rel='nofollow external'>http://www.Domain.com/pageid/"</a> Local $sFind = "HerroThere" Local $sSource = "" Local $iFirstPage = 1 Local $iLastPage = 20 For $i = $iFirstPage To $iLastPage   $sSource = _INetGetSource($sUrl & $i)   if StringInStr($sSource, $sFind,0) Then     FileWriteLine("test.txt","Found " & $sFind & " on page " & $i & " of " & $sUrl)   endif Next

Unfortunately the code you wrote there always results in error. Played around with it abit and it is scanning, but nothing is being wrote to file.

SupGuvna,


Now you have 5 posts you should see one at bottom right. ;)

M23

Thanks <3


Edit:
Messed around with the code and cleaned it up abit <3 Works just fine now. Thanks for the lovely education you guys!

Edited by SupGuvna, 14 April 2012 - 08:27 PM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users