Jump to content
Sign in to follow this  

Stupid question about Inetget and getting all links on the page.

Recommended Posts

ok, i'm tryin to make a script for my unattended windows CD. that goes to my FTP, downloads the source of the FTP (the HTML) and looks in the $source string for all "<a href=" and all "</a>" then download all the exe files.

i want to do this, so if i want to update a file on my unattended CD, i just update the file on my FTP and its all done.

heres what i have thus far. note that i have it pointed to www.google.com just as a test. and i have the "If InetGet($url" and "FileDelete(" lines commented out so i'm not hammering googles servers.

HotKeySet("{ESC}", "Terminate")

$url = "http://www.google.com/"
$source = _data()


$urla = StringTrimLeft($source,StringInStr($source,"<a href=")+8)
$urla = StringLeft($urla, StringInStr($urla, ">") - 2)


Func _data()
    Local $source
    If InetGet($url, "C:\data.txt") Then
        $source = FileRead("C:\data.txt", FileGetSize("C:\data.txt"))
        Return $source
        Return -1

Func Terminate()
    Exit 0

the problem i'm having is...

i want it to get ALL of the "<a href" links... right now it just shows a messagebox of the first one.. so how could i get it to search all the code, for all <a href's.


Share this post

Link to post
Share on other sites

Hello there, I've had to deal with this problem to, it's quite annoying.

If you want to get rid of all <a href thingys in there you just do the following, or this is what I did at least.

You can modify it to remove as many things as you want. It's probably inefficient code, but it works for me.

#include <file.au3>
#include <Process.au3>

;----------------------Read INETGet data
InetGet("http://www.google.com/", "C:\data.txt")
$googleget = FileOpen ( "C:\data.txt", 0 )

;----------------------Create file to write to with corrections
_FileCreate ( "C:\_Stripped.htm" )
$googlestrip = FileOpen ( "C:\_Stripped.htm", 2 )

;----------------------Get total file lines of source
$cnt = 0
While 1
    $linenum = FileReadLine( $googleget )
    $cnt = $cnt + 1
    If @error = -1 Then ExitLoop

;----------------------Replace selected things with nothing
$i = 0
    $line = FileReadLine ( $googleget , $i )
    $rep = StringReplace ( $line, "<", "" )
    $rep1 = StringReplace ( $rep, ">", "" )
    $rep2 = StringReplace ( $rep1, "=", "" )
    FileWriteLine ( $googlestrip, $rep2 )
    $end = StringInStr ( $rep2, "body")
    $i = $i + 1
Until $i = $cnt


Hope this helps B)

Share this post

Link to post
Share on other sites

Here's a couple of my UDF's that separate out all the http:// LINKS from html source code. Just give it a variable with the source code, and it returns a CRLF delimited list of links it finds.

$sBigListOLinks = _GetLinks($source)
  $aLinks = StringSplit($sBigListOLinks,@CRLF,1)
;Do something with this array of links....

Func _GetLinks($sData)
      Dim $iCounter,$iPointer,$iPointer2,$sLinks
      for $iCounter = 1 to StringLen($sData)
         $iPointer = _SearchTarget($sData,"href=" & chr(34) & "http://",$iCounter,1) - 8; Looks for any href="http://
          if @error = 0 Then
              $iCounter = $iPointer
             $iPointer2 = _SearchTarget($sData,chr(34),$iPointer,0) - 1; Looks for the next "
              if @error = 0 Then
                  $iCounter = $iPointer2
                 $sLinks = $sLinks & StringMid($sData,$iPointer,$iPointer2-$iPointer) & @CRLF
      next; Loops until it finds all matches in $sData
      if $sLinks = "" Then
     Return $sLinks; Returns Variable with all found links.
  Func _SearchTarget($sString,$sTarget,$iPointerIn,$iAfter)
      Dim $iPointerTemp,$iPointerOut
      $iPointerTemp = StringInStr(StringMid($sString,$iPointerIn),$sTarget,0)
      if $iPointerTemp > 0 then
          $iPointerOut = $iPointerTemp
         $iPointerOut = $iPointerOut + $iPointerIn; Adds given optional offset to Pointer location
          if $iAfter = 1 then
             $iPointerOut = $iPointerOut + StringLen($sTarget); Sets Pointer location AFTER Target string
 ;Target not found
      Return $iPointerOut

Hope this helps.


Edited by TrystianSky

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...