Dieuz Posted January 3, 2010 Posted January 3, 2010 Hey guys,I am having a hard time extracting the links + Anchor Text from a source page.#include <Array.au3> #include <IE.au3> $Primary_url = "http://www.britannica.com/blogs/2008/04/are-newspapers-doomed-do-we-care-newspapers-the-net-forum/" ; Any URL $IE = _IECreate($Primary_url,0,1,1) $pagesource = _IEBodyReadHTML($IE) $array = StringRegExp($pagesource,'(?:<A href=")(http.*?)(?:">)(.*?)(?:</A>)',3) _IEQuit($IE) _ArrayDisplay($array, "Test")I am trying to extract the url (http://...) and the related anchor text. The thing is that sometime there is no anchor text at all or there are other parameters such as <B>,<COLOR> etc.. and all these things mess up my regular expression.I am not really good at writting regular expression so I would appreciate a little help here.Thanks!
logmein Posted January 3, 2010 Posted January 3, 2010 (edited) Ok, simple : _IELinkGetCollection () #include <IE.au3> _IELinkGetCollection ( ByRef $o_object [, $i_index = -1] ) Parameters $o_object Object variable of an InternetExplorer.Application, Window or Frame object $i_index Optional: specifies whether to return a collection or indexed instance 0 or positive integer returns an indexed instance -1 = (Default) returns a collection Edited January 3, 2010 by logmein [font=arial, helvetica, sans-serif][s]Total USB Security 3.0 Beta[/s] | [s]Malware Kill[/s] | Malware Scanner | Screen Hider | Locker | Matrix Generator[s]AUTO-SYNC 1.0 | MD5 Hash Generator | URL Checker | Tube Take [/s]| Random Text[/font]
Dieuz Posted January 3, 2010 Author Posted January 3, 2010 _IELinkGetCollection () is great to extract all links but I cant extract the anchor text with it. It's why I would like to use a regular expression...Is there anyway to gather the anchor text with _IELinkGetCollection ()?
Dieuz Posted January 4, 2010 Author Posted January 4, 2010 (edited) I can use $oLink.href to retrieve the link, but can I use $oLink.innerText to get the anchor text? Edited January 4, 2010 by Dieuz
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now