James Posted November 5, 2008 Share Posted November 5, 2008 Is there any solid way that I am able to retrieve the META information from a website? I have looked at using, _IEGetObjByID() but I'm not sure if this is correct. Thanks and sorry for looking like a retard if I miss this. James Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
senthor Posted November 5, 2008 Share Posted November 5, 2008 Try _INetGetSource() in combination with XML DOM Wrapper. FileListToArray UDFMy tools Link to comment Share on other sites More sharing options...
James Posted November 5, 2008 Author Share Posted November 5, 2008 I was actually hoping for a faster and smaller way of doing it. I am writing a website crawler for a friend who is running a search engine. I need to retrieve the different META tags such as:TitleAuthorKeywordsRobotsDescription Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
senthor Posted November 5, 2008 Share Posted November 5, 2008 I don't know if autoit is the right language for that... I'd do it with Java FileListToArray UDFMy tools Link to comment Share on other sites More sharing options...
James Posted November 5, 2008 Author Share Posted November 5, 2008 I don't really know Java although I have found code to do it, I was just wondering if it was possible to make one. I currently have this: expandcollapse popup#include <Array.au3> #include <File.au3> #include <IE.au3> Global $arLinks[1], $IE GetLinks("http://www.james-brooks.net") Func GetLinks($io_Website) $IE = _IECreate($io_Website, 0, 0) $Links = _IELinkGetCollection($IE); Retrieve all links $iNumLinks = @extended; Retrieves the amount ;ConsoleWrite("!>Found " & $iNumLinks & " links" & @CRLF) For $Link In $Links _ArrayAdd($arLinks, $Link.href) Next ; Output each site and it's link $inArray = UBound($arLinks) - 1 $arLinks[0] = $inArray For $i = 0 To $inArray ConsoleWrite($arLinks[$i] & @CRLF) Next _ArrayUnique($arLinks); Delete all duplicate links GetMETA() EndFunc ;==>GetLinks Func _ArrayUnique(ByRef $aArray, $vDelim = '', $iBase = 0, $iCase = 0) If Not IsArray($aArray) Then Return SetError(1, 0, 0) If $vDelim = '' Then $vDelim = Chr(1) Local $sHold = "" For $iCC = $iBase To UBound($aArray) - 1 If Not StringInStr($vDelim & $sHold, $vDelim & $aArray[$iCC] & $vDelim, $iCase) Then $sHold &= $aArray[$iCC] & $vDelim EndIf Next $sHold = StringTrimRight($sHold, StringLen($vDelim)) If $sHold And $iBase = 1 Then $aArray = StringSplit($sHold, $vDelim) Return SetError(0, 0, $aArray) ElseIf $sHold And $iBase = 0 Then $aArray = StringRegExp($sHold & $vDelim, "(?s)(.+?)" & $vDelim, 3) Return SetError(0, 0, $aArray) EndIf Return SetError(2, 0, 0) EndFunc Func GetMETA() Local $Title, $Author, $Robots, $Text, $Description $Author = _IEGetObjById ($IE, "author") Return _IEPropertyGet($Author, "outerhtml") EndFunc ;==>GetMETA Which at the moment retrieves all the links on a page, puts them into an array and deletes all duplicates. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
James Posted November 5, 2008 Author Share Posted November 5, 2008 (edited) Nevermind, I had the wrong parameter in the _IEGetObjByID() function. Now I just don't know how to get the author information out of the tags :duh: Edited November 5, 2008 by JamesBrooks Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
senthor Posted November 5, 2008 Share Posted November 5, 2008 Now, I recommend using XML DOM Wrapper UDF.Does exactly what you want:Gets/writes content into a xml system such as the author tag.Look here FileListToArray UDFMy tools Link to comment Share on other sites More sharing options...
James Posted November 5, 2008 Author Share Posted November 5, 2008 I think I will look into it then. May speed the process up to I guess. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
James Posted November 5, 2008 Author Share Posted November 5, 2008 I have no idea how to use this UDF to read the HTML. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
James Posted November 6, 2008 Author Share Posted November 6, 2008 Yeah so this XML thing is not working for me. I cannot seem to understand it Anyone got some ideas? Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now