Sign in to follow this  
Followers 0
Dieuz

IE - Extract Links from Source Page

4 posts in this topic

Hey guys,

I am having a hard time extracting the links + Anchor Text from a source page.

#include <Array.au3>
#include <IE.au3> 

$Primary_url = "http://www.britannica.com/blogs/2008/04/are-newspapers-doomed-do-we-care-newspapers-the-net-forum/" ; Any URL

$IE = _IECreate($Primary_url,0,1,1)
$pagesource = _IEBodyReadHTML($IE)

$array = StringRegExp($pagesource,'(?:<A href=")(http.*?)(?:">)(.*?)(?:</A>)',3)

 _IEQuit($IE)
 _ArrayDisplay($array, "Test")

I am trying to extract the url (http://...) and the related anchor text. The thing is that sometime there is no anchor text at all or there are other parameters such as <B>,<COLOR> etc.. and all these things mess up my regular expression.

I am not really good at writting regular expression so I would appreciate a little help here.

Thanks!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Ok, simple : _IELinkGetCollection ()

#include <IE.au3>

_IELinkGetCollection ( ByRef $o_object [, $i_index = -1] )

Parameters

$o_object Object variable of an InternetExplorer.Application, Window or Frame object

$i_index Optional: specifies whether to return a collection or indexed instance

0 or positive integer returns an indexed instance

-1 = (Default) returns a collection

Edited by logmein

Share this post


Link to post
Share on other sites

_IELinkGetCollection () is great to extract all links but I cant extract the anchor text with it. It's why I would like to use a regular expression...

Is there anyway to gather the anchor text with _IELinkGetCollection ()?

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

I can use $oLink.href to retrieve the link, but can I use $oLink.innerText to get the anchor text?

Edited by Dieuz

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0