Jump to content

Harvest HTML links from a web-page

Recommended Posts

Here is a UDF that downloads a given web-page and returns the embedded links as an array.


;  Determining the link on a webpage.

    $sURL   = "http://www.google.com/nwshp?hl=en&tab=wn"
    $sURL   = "http://www.google.com/"
    $asLinkList = _GetLinks($sURL)

    For $nZ = 1 to $asLinkList[0]
        Msgbox (0,"Link: " & $nz, $asLinkList[$nZ])
Func _GetLinks($psURL)
;Returns an array of links from a webpage

;Download the HTML to a temporary file
    $sTempFile  = "$tridsf13.htm"
    URLDownloadToFile($psURL, $sTempFile)
    $sHTML = FileRead($sTempFile, FileGetSize($sTempFile))
;Cleanup the HTML for better consumption    
    $sHTML = StringReplace($sHTML, @CR, "")
    $sHTML = StringReplace($sHTML, @LF, "")
    $sHTML = StringReplace($sHTML, @TAB, " ")

;Break it into chewable bytes
    $sHTML = StringReplace($sHTML, "href=", @LF & "href=")
    $sHTML = StringReplace($sHTML, "</a>", @LF & "scrap")
    $asHTML = StringSplit($sHTML, @LF)
;Spit out the bones
    $sLinks = ""
    For $nX = 1 to $asHTML[0]
       ;Process only "href=" lines
        If StringLeft($asHTML[$nX],5) = "href=" then
            $asLink = StringSplit($asHTML[$nX], ">")
            $sLinks = $sLinks & @LF & $asLink[1]

;Return the juicy links
    Return StringSplit(StringTrimLeft($sLinks,1), @LF)
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...