Sign in to follow this  
Followers 0

Harvest HTML links from a web-page

1 post in this topic

Here is a UDF that downloads a given web-page and returns the embedded links as an array.


;  Determining the link on a webpage.

    $sURL   = ""
    $sURL   = ""
    $asLinkList = _GetLinks($sURL)

    For $nZ = 1 to $asLinkList[0]
        Msgbox (0,"Link: " & $nz, $asLinkList[$nZ])
Func _GetLinks($psURL)
;Returns an array of links from a webpage

;Download the HTML to a temporary file
    $sTempFile  = "$tridsf13.htm"
    URLDownloadToFile($psURL, $sTempFile)
    $sHTML = FileRead($sTempFile, FileGetSize($sTempFile))
;Cleanup the HTML for better consumption    
    $sHTML = StringReplace($sHTML, @CR, "")
    $sHTML = StringReplace($sHTML, @LF, "")
    $sHTML = StringReplace($sHTML, @TAB, " ")

;Break it into chewable bytes
    $sHTML = StringReplace($sHTML, "href=", @LF & "href=")
    $sHTML = StringReplace($sHTML, "</a>", @LF & "scrap")
    $asHTML = StringSplit($sHTML, @LF)
;Spit out the bones
    $sLinks = ""
    For $nX = 1 to $asHTML[0]
       ;Process only "href=" lines
        If StringLeft($asHTML[$nX],5) = "href=" then
            $asLink = StringSplit($asHTML[$nX], ">")
            $sLinks = $sLinks & @LF & $asLink[1]

;Return the juicy links
    Return StringSplit(StringTrimLeft($sLinks,1), @LF)

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
Followers 0