Jump to content

Recommended Posts

Posted

Hey guys,

I am having a hard time extracting the links + Anchor Text from a source page.

#include <Array.au3>
#include <IE.au3> 

$Primary_url = "http://www.britannica.com/blogs/2008/04/are-newspapers-doomed-do-we-care-newspapers-the-net-forum/" ; Any URL

$IE = _IECreate($Primary_url,0,1,1)
$pagesource = _IEBodyReadHTML($IE)

$array = StringRegExp($pagesource,'(?:<A href=")(http.*?)(?:">)(.*?)(?:</A>)',3)

 _IEQuit($IE)
 _ArrayDisplay($array, "Test")

I am trying to extract the url (http://...) and the related anchor text. The thing is that sometime there is no anchor text at all or there are other parameters such as <B>,<COLOR> etc.. and all these things mess up my regular expression.

I am not really good at writting regular expression so I would appreciate a little help here.

Thanks!

Posted (edited)

Ok, simple : _IELinkGetCollection ()

#include <IE.au3>

_IELinkGetCollection ( ByRef $o_object [, $i_index = -1] )

Parameters

$o_object Object variable of an InternetExplorer.Application, Window or Frame object

$i_index Optional: specifies whether to return a collection or indexed instance

0 or positive integer returns an indexed instance

-1 = (Default) returns a collection

Edited by logmein
Posted

_IELinkGetCollection () is great to extract all links but I cant extract the anchor text with it. It's why I would like to use a regular expression...

Is there anyway to gather the anchor text with _IELinkGetCollection ()?

Posted (edited)

I can use $oLink.href to retrieve the link, but can I use $oLink.innerText to get the anchor text?

Edited by Dieuz

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...