Jump to content
Sign in to follow this  
Marnie

Copy text from tag (HTML source code) which contains specific word

Recommended Posts

Marnie

Hi all,

I need help please :)

There is a HTML source code which contains:

<a class="productDetailLink" title="Bačkora" href="http://www.hracky.cz/backora">

The source code has almost 6000 lines.Line containing "productDetailLink" (above) is unique for whole code.

I need script which copy text from href (means actually the link in " ")  and save it to excel file.

 

Many thanks

Share this post


Link to post
Share on other sites
orbs

try to code this:

read the entire file to a single string var

use StringSplit() with the string "productDetailLink" as the delimiter

you will have an array of strings

loop the array for every string:

- use StringInStr() to locate only the first instance of "href"

- locate the 1st and 2nd instances of the double-quote character following the 1st "href"

- read whats in the middle

- write into a new line of a text file

when all done, open the text file with Excel.

Share this post


Link to post
Share on other sites
mikell

Or you can InetRead the source code and then use this regular expression on the whole text to separate the link

$res = StringRegExpReplace($text, '(?s).+productDetailLink.+?href="([^"]+).+', "$1")
msgbox(0,"", $res)

Share this post


Link to post
Share on other sites
Chimp

hi Marnie

you could try something like this:

#include <IE.au3>
Local $oIE = _IECreate("www.yoursite.com") ; <--- change this link 
Local $oLinks = _IELinkGetCollection($oIE)
For $olink In $oLinks
    If StringInStr($olink.outerhtml, "productDetailLink") Then
        ConsoleWrite("href Info ===>" & $olink.href & @CRLF)
    EndIf
Next

bye


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
Marnie

mikell thanks very much...that's exactly what i was looking for ... appreciated :)

Share this post


Link to post
Share on other sites
Marnie

Please What to do if there is an exception that link with "productDetailLink" is not included. I tried to define it as follows but no luck:

$res = StringRegExpReplace($address, '(?s).+productDetailLink.+?href="([^"]+).+', "$1")
   If @error = 0  Then 
      ; do if error occurs    
   Else  
      FileWriteLine($file, $res & @CRLF)
      
   EndIf

Share this post


Link to post
Share on other sites
dragan

Local $text1 = '<a class="productDetailLink" title="Bačkora" href="http://www.hracky.cz/backora">'
Local $text2 = '<a class="productDetailLinkXXXXX" title="Bačkora" href="http://www.hracky.cz/backora">';<---- addition with XXXXX
;===============================================================================================
;===============================================================================================
Local $Link1 = _GetLinkBackFrom($text1, 'productDetailLink')
If NOT @error Then
    MsgBox(0, 'Success - $text1', $Link1);<----- will be successfull
Else
    MsgBox(0, 'Failure - $text1', $Link1)
EndIf

Local $Link2 = _GetLinkBackFrom($text2, 'productDetailLink')
If NOT @error Then
    MsgBox(0, 'Success - $text2', $Link2)
Else
    MsgBox(0, 'Failure - $text2', 'Link does not exist');<---- will have error
EndIf
;===============================================================================================


;==================================== Function: ================================================
;===============================================================================================
;   $__text = text from which you want to extract the url
;   $__attributName = matching attribute name (can be class, id, name, etc...)

Func _GetLinkBackFrom($__text, $__attributName)
    Local $__pattern = '(?s).+\"' & $__attributName & '\".+?href="([^"]+).+'
    If StringRegExp($__text, $__pattern) Then
        Return StringRegExpReplace($__text, $__pattern, "$1")
    Else
        Return SetError(1, 0, '')
    EndIf
EndFunc
;===============================================================================================
;===============================================================================================

Share this post


Link to post
Share on other sites
Chimp

excuse me, maybe I did not understand the question
but isn't easier to extract the links with this simple script?

if you have the source code in an html file and not on a web site,

you can always use the command _IECreate () pointing to that local file instead of a url:

#include <IE.au3>
Local $oIE = _IECreate(".\file.html") ; <--- path of local html file 
Local $oLinks = _IELinkGetCollection($oIE)
For $olink In $oLinks
    If StringInStr($olink.outerhtml, "productDetailLink") Then
        ConsoleWrite("href Info ===>" & $olink.href & @CRLF)
    EndIf
Next

bye


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
Chimp

Hi Marnie,

it would not bad if you show us your solution :)

bye


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites
remin

I was wondering if it is possible to copy also a link text with autoit?

I use a Copy Link Text plugin in my chrome and in my firefox browser.

(right click on link --> "Copy Link Text")

Would be much easier if I could do it using autoit.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×