Jump to content
Sign in to follow this  
James

Get META information from website

Recommended Posts

James

Is there any solid way that I am able to retrieve the META information from a website? I have looked at using, _IEGetObjByID() but I'm not sure if this is correct.

Thanks and sorry for looking like a retard if I miss this.

James

Share this post


Link to post
Share on other sites
James

I was actually hoping for a faster and smaller way of doing it. I am writing a website crawler for a friend who is running a search engine. I need to retrieve the different META tags such as:

  • Title
  • Author
  • Keywords
  • Robots
  • Description

Share this post


Link to post
Share on other sites
James

I don't really know Java although I have found code to do it, I was just wondering if it was possible to make one. I currently have this:

#include <Array.au3>
#include <File.au3>
#include <IE.au3>

Global $arLinks[1], $IE

GetLinks("http://www.james-brooks.net")

Func GetLinks($io_Website)
    $IE = _IECreate($io_Website, 0, 0)
    $Links = _IELinkGetCollection($IE); Retrieve all links
    $iNumLinks = @extended; Retrieves the amount
    
;ConsoleWrite("!>Found " & $iNumLinks & " links" & @CRLF)
    For $Link In $Links
        _ArrayAdd($arLinks, $Link.href)
    Next
    
; Output each site and it's link
    $inArray = UBound($arLinks) - 1
    $arLinks[0] = $inArray
    For $i = 0 To $inArray
        ConsoleWrite($arLinks[$i] & @CRLF)
    Next
    _ArrayUnique($arLinks); Delete all duplicate links
    GetMETA()
EndFunc  ;==>GetLinks

Func _ArrayUnique(ByRef $aArray, $vDelim = '', $iBase = 0, $iCase = 0)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)
    If $vDelim = '' Then $vDelim = Chr(1)
    Local $sHold = ""
    For $iCC = $iBase To UBound($aArray) - 1
        If Not StringInStr($vDelim & $sHold, $vDelim & $aArray[$iCC] & $vDelim, $iCase) Then
            $sHold &= $aArray[$iCC] & $vDelim
        EndIf
    Next
    $sHold = StringTrimRight($sHold, StringLen($vDelim))
    If $sHold And $iBase = 1 Then
        $aArray = StringSplit($sHold, $vDelim)
        Return SetError(0, 0, $aArray)
    ElseIf $sHold And $iBase = 0 Then
        $aArray = StringRegExp($sHold & $vDelim, "(?s)(.+?)" & $vDelim, 3)
        Return SetError(0, 0, $aArray)
    EndIf
    Return SetError(2, 0, 0)
EndFunc

Func GetMETA()
    Local $Title, $Author, $Robots, $Text, $Description
    $Author = _IEGetObjById ($IE, "author")
    Return _IEPropertyGet($Author, "outerhtml")
EndFunc  ;==>GetMETA

Which at the moment retrieves all the links on a page, puts them into an array and deletes all duplicates.

Share this post


Link to post
Share on other sites
James

Nevermind, I had the wrong parameter in the _IEGetObjByID() function.

Now I just don't know how to get the author information out of the tags :duh:

Edited by JamesBrooks

Share this post


Link to post
Share on other sites
senthor

Now, I recommend using XML DOM Wrapper UDF.

Does exactly what you want:

Gets/writes content into a xml system such as the author tag.

Look here

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×