James Posted November 5, 2008 Posted November 5, 2008 Is there any solid way that I am able to retrieve the META information from a website? I have looked at using, _IEGetObjByID() but I'm not sure if this is correct. Thanks and sorry for looking like a retard if I miss this. James Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
senthor Posted November 5, 2008 Posted November 5, 2008 Try _INetGetSource() in combination with XML DOM Wrapper. FileListToArray UDFMy tools
James Posted November 5, 2008 Author Posted November 5, 2008 I was actually hoping for a faster and smaller way of doing it. I am writing a website crawler for a friend who is running a search engine. I need to retrieve the different META tags such as:TitleAuthorKeywordsRobotsDescription Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
senthor Posted November 5, 2008 Posted November 5, 2008 I don't know if autoit is the right language for that... I'd do it with Java FileListToArray UDFMy tools
James Posted November 5, 2008 Author Posted November 5, 2008 I don't really know Java although I have found code to do it, I was just wondering if it was possible to make one. I currently have this: expandcollapse popup#include <Array.au3> #include <File.au3> #include <IE.au3> Global $arLinks[1], $IE GetLinks("http://www.james-brooks.net") Func GetLinks($io_Website) $IE = _IECreate($io_Website, 0, 0) $Links = _IELinkGetCollection($IE); Retrieve all links $iNumLinks = @extended; Retrieves the amount ;ConsoleWrite("!>Found " & $iNumLinks & " links" & @CRLF) For $Link In $Links _ArrayAdd($arLinks, $Link.href) Next ; Output each site and it's link $inArray = UBound($arLinks) - 1 $arLinks[0] = $inArray For $i = 0 To $inArray ConsoleWrite($arLinks[$i] & @CRLF) Next _ArrayUnique($arLinks); Delete all duplicate links GetMETA() EndFunc ;==>GetLinks Func _ArrayUnique(ByRef $aArray, $vDelim = '', $iBase = 0, $iCase = 0) If Not IsArray($aArray) Then Return SetError(1, 0, 0) If $vDelim = '' Then $vDelim = Chr(1) Local $sHold = "" For $iCC = $iBase To UBound($aArray) - 1 If Not StringInStr($vDelim & $sHold, $vDelim & $aArray[$iCC] & $vDelim, $iCase) Then $sHold &= $aArray[$iCC] & $vDelim EndIf Next $sHold = StringTrimRight($sHold, StringLen($vDelim)) If $sHold And $iBase = 1 Then $aArray = StringSplit($sHold, $vDelim) Return SetError(0, 0, $aArray) ElseIf $sHold And $iBase = 0 Then $aArray = StringRegExp($sHold & $vDelim, "(?s)(.+?)" & $vDelim, 3) Return SetError(0, 0, $aArray) EndIf Return SetError(2, 0, 0) EndFunc Func GetMETA() Local $Title, $Author, $Robots, $Text, $Description $Author = _IEGetObjById ($IE, "author") Return _IEPropertyGet($Author, "outerhtml") EndFunc ;==>GetMETA Which at the moment retrieves all the links on a page, puts them into an array and deletes all duplicates. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
James Posted November 5, 2008 Author Posted November 5, 2008 (edited) Nevermind, I had the wrong parameter in the _IEGetObjByID() function. Now I just don't know how to get the author information out of the tags :duh: Edited November 5, 2008 by JamesBrooks Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
senthor Posted November 5, 2008 Posted November 5, 2008 Now, I recommend using XML DOM Wrapper UDF.Does exactly what you want:Gets/writes content into a xml system such as the author tag.Look here FileListToArray UDFMy tools
James Posted November 5, 2008 Author Posted November 5, 2008 I think I will look into it then. May speed the process up to I guess. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
James Posted November 5, 2008 Author Posted November 5, 2008 I have no idea how to use this UDF to read the HTML. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
James Posted November 6, 2008 Author Posted November 6, 2008 Yeah so this XML thing is not working for me. I cannot seem to understand it Anyone got some ideas? Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now