Stupid question about Inetget and getting all links on the page.

ok, i'm tryin to make a script for my unattended windows CD. that goes to my FTP, downloads the source of the FTP (the HTML) and looks in the $source string for all "<a href=" and all "</a>" then download all the exe files.

i want to do this, so if i want to update a file on my unattended CD, i just update the file on my FTP and its all done.

heres what i have thus far. note that i have it pointed to www.google.com just as a test. and i have the "If InetGet($url" and "FileDelete(" lines commented out so i'm not hammering googles servers.

HotKeySet("{ESC}", "Terminate")

$url = "http://www.google.com/"
$source = _data()


$urla = StringTrimLeft($source,StringInStr($source,"<a href=")+8)
$urla = StringLeft($urla, StringInStr($urla, ">") - 2)


Func _data()
    Local $source
    If InetGet($url, "C:\data.txt") Then
        $source = FileRead("C:\data.txt", FileGetSize("C:\data.txt"))
        Return $source
        Return -1

Func Terminate()
    Exit 0

the problem i'm having is...

i want it to get ALL of the "<a href" links... right now it just shows a messagebox of the first one.. so how could i get it to search all the code, for all <a href's.


Hello there, I've had to deal with this problem to, it's quite annoying.

If you want to get rid of all <a href thingys in there you just do the following, or this is what I did at least.

You can modify it to remove as many things as you want. It's probably inefficient code, but it works for me.

#include <file.au3>
#include <Process.au3>

;----------------------Read INETGet data
InetGet("http://www.google.com/", "C:\data.txt")
$googleget = FileOpen ( "C:\data.txt", 0 )

;----------------------Create file to write to with corrections
_FileCreate ( "C:\_Stripped.htm" )
$googlestrip = FileOpen ( "C:\_Stripped.htm", 2 )

;----------------------Get total file lines of source
$cnt = 0
While 1
    $linenum = FileReadLine( $googleget )
    $cnt = $cnt + 1
    If @error = -1 Then ExitLoop

;----------------------Replace selected things with nothing
$i = 0
    $line = FileReadLine ( $googleget , $i )
    $rep = StringReplace ( $line, "<", "" )
    $rep1 = StringReplace ( $rep, ">", "" )
    $rep2 = StringReplace ( $rep1, "=", "" )
    FileWriteLine ( $googlestrip, $rep2 )
    $end = StringInStr ( $rep2, "body")
    $i = $i + 1
Until $i = $cnt


Hope this helps B)

Here's a couple of my UDF's that separate out all the http:// LINKS from html source code. Just give it a variable with the source code, and it returns a CRLF delimited list of links it finds.

$sBigListOLinks = _GetLinks($source)
  $aLinks = StringSplit($sBigListOLinks,@CRLF,1)
;Do something with this array of links....

Func _GetLinks($sData)
      Dim $iCounter,$iPointer,$iPointer2,$sLinks
      for $iCounter = 1 to StringLen($sData)
         $iPointer = _SearchTarget($sData,"href=" & chr(34) & "http://",$iCounter,1) - 8; Looks for any href="http://
          if @error = 0 Then
              $iCounter = $iPointer
             $iPointer2 = _SearchTarget($sData,chr(34),$iPointer,0) - 1; Looks for the next "
              if @error = 0 Then
                  $iCounter = $iPointer2
                 $sLinks = $sLinks & StringMid($sData,$iPointer,$iPointer2-$iPointer) & @CRLF
      next; Loops until it finds all matches in $sData
      if $sLinks = "" Then
     Return $sLinks; Returns Variable with all found links.
  Func _SearchTarget($sString,$sTarget,$iPointerIn,$iAfter)
      Dim $iPointerTemp,$iPointerOut
      $iPointerTemp = StringInStr(StringMid($sString,$iPointerIn),$sTarget,0)
      if $iPointerTemp > 0 then
          $iPointerOut = $iPointerTemp
         $iPointerOut = $iPointerOut + $iPointerIn; Adds given optional offset to Pointer location
          if $iAfter = 1 then
             $iPointerOut = $iPointerOut + StringLen($sTarget); Sets Pointer location AFTER Target string
 ;Target not found
      Return $iPointerOut

Hope this helps.


Edited by TrystianSky
