Jump to content
Sign in to follow this  

Harvest HTML links from a web-page

Recommended Posts


Here is a UDF that downloads a given web-page and returns the embedded links as an array.


;  Determining the link on a webpage.

    $sURL   = "http://www.google.com/nwshp?hl=en&tab=wn"
    $sURL   = "http://www.google.com/"
    $asLinkList = _GetLinks($sURL)

    For $nZ = 1 to $asLinkList[0]
        Msgbox (0,"Link: " & $nz, $asLinkList[$nZ])
Func _GetLinks($psURL)
;Returns an array of links from a webpage

;Download the HTML to a temporary file
    $sTempFile  = "$tridsf13.htm"
    URLDownloadToFile($psURL, $sTempFile)
    $sHTML = FileRead($sTempFile, FileGetSize($sTempFile))
;Cleanup the HTML for better consumption    
    $sHTML = StringReplace($sHTML, @CR, "")
    $sHTML = StringReplace($sHTML, @LF, "")
    $sHTML = StringReplace($sHTML, @TAB, " ")

;Break it into chewable bytes
    $sHTML = StringReplace($sHTML, "href=", @LF & "href=")
    $sHTML = StringReplace($sHTML, "</a>", @LF & "scrap")
    $asHTML = StringSplit($sHTML, @LF)
;Spit out the bones
    $sLinks = ""
    For $nX = 1 to $asHTML[0]
       ;Process only "href=" lines
        If StringLeft($asHTML[$nX],5) = "href=" then
            $asLink = StringSplit($asHTML[$nX], ">")
            $sLinks = $sLinks & @LF & $asLink[1]

;Return the juicy links
    Return StringSplit(StringTrimLeft($sLinks,1), @LF)

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  


Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.