Dieuz Posted January 3, 2010 Share Posted January 3, 2010 Hey guys,I am having a hard time extracting the links + Anchor Text from a source page.#include <Array.au3> #include <IE.au3> $Primary_url = "http://www.britannica.com/blogs/2008/04/are-newspapers-doomed-do-we-care-newspapers-the-net-forum/" ; Any URL $IE = _IECreate($Primary_url,0,1,1) $pagesource = _IEBodyReadHTML($IE) $array = StringRegExp($pagesource,'(?:<A href=")(http.*?)(?:">)(.*?)(?:</A>)',3) _IEQuit($IE) _ArrayDisplay($array, "Test")I am trying to extract the url (http://...) and the related anchor text. The thing is that sometime there is no anchor text at all or there are other parameters such as <B>,<COLOR> etc.. and all these things mess up my regular expression.I am not really good at writting regular expression so I would appreciate a little help here.Thanks! Link to comment Share on other sites More sharing options...
logmein Posted January 3, 2010 Share Posted January 3, 2010 (edited) Ok, simple : _IELinkGetCollection () #include <IE.au3> _IELinkGetCollection ( ByRef $o_object [, $i_index = -1] ) Parameters $o_object Object variable of an InternetExplorer.Application, Window or Frame object $i_index Optional: specifies whether to return a collection or indexed instance 0 or positive integer returns an indexed instance -1 = (Default) returns a collection Edited January 3, 2010 by logmein [font=arial, helvetica, sans-serif][s]Total USB Security 3.0 Beta[/s] | [s]Malware Kill[/s] | Malware Scanner | Screen Hider | Locker | Matrix Generator[s]AUTO-SYNC 1.0 | MD5 Hash Generator | URL Checker | Tube Take [/s]| Random Text[/font] Link to comment Share on other sites More sharing options...
Dieuz Posted January 3, 2010 Author Share Posted January 3, 2010 _IELinkGetCollection () is great to extract all links but I cant extract the anchor text with it. It's why I would like to use a regular expression...Is there anyway to gather the anchor text with _IELinkGetCollection ()? Link to comment Share on other sites More sharing options...
Dieuz Posted January 4, 2010 Author Share Posted January 4, 2010 (edited) I can use $oLink.href to retrieve the link, but can I use $oLink.innerText to get the anchor text? Edited January 4, 2010 by Dieuz Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now