Jump to content

Copy text from tag (HTML source code) which contains specific word


Marnie
 Share

Recommended Posts

Hi all,

I need help please :)

There is a HTML source code which contains:

<a class="productDetailLink" title="Bačkora" href="http://www.hracky.cz/backora">

The source code has almost 6000 lines.Line containing "productDetailLink" (above) is unique for whole code.

I need script which copy text from href (means actually the link in " ")  and save it to excel file.

 

Many thanks

Link to comment
Share on other sites

try to code this:

read the entire file to a single string var

use StringSplit() with the string "productDetailLink" as the delimiter

you will have an array of strings

loop the array for every string:

- use StringInStr() to locate only the first instance of "href"

- locate the 1st and 2nd instances of the double-quote character following the 1st "href"

- read whats in the middle

- write into a new line of a text file

when all done, open the text file with Excel.

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

hi Marnie

you could try something like this:

#include <IE.au3>
Local $oIE = _IECreate("www.yoursite.com") ; <--- change this link 
Local $oLinks = _IELinkGetCollection($oIE)
For $olink In $oLinks
    If StringInStr($olink.outerhtml, "productDetailLink") Then
        ConsoleWrite("href Info ===>" & $olink.href & @CRLF)
    EndIf
Next

bye

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Please What to do if there is an exception that link with "productDetailLink" is not included. I tried to define it as follows but no luck:

$res = StringRegExpReplace($address, '(?s).+productDetailLink.+?href="([^"]+).+', "$1")
   If @error = 0  Then 
      ; do if error occurs    
   Else  
      FileWriteLine($file, $res & @CRLF)
      
   EndIf
Link to comment
Share on other sites

Local $text1 = '<a class="productDetailLink" title="Bačkora" href="http://www.hracky.cz/backora">'
Local $text2 = '<a class="productDetailLinkXXXXX" title="Bačkora" href="http://www.hracky.cz/backora">';<---- addition with XXXXX
;===============================================================================================
;===============================================================================================
Local $Link1 = _GetLinkBackFrom($text1, 'productDetailLink')
If NOT @error Then
    MsgBox(0, 'Success - $text1', $Link1);<----- will be successfull
Else
    MsgBox(0, 'Failure - $text1', $Link1)
EndIf

Local $Link2 = _GetLinkBackFrom($text2, 'productDetailLink')
If NOT @error Then
    MsgBox(0, 'Success - $text2', $Link2)
Else
    MsgBox(0, 'Failure - $text2', 'Link does not exist');<---- will have error
EndIf
;===============================================================================================


;==================================== Function: ================================================
;===============================================================================================
;   $__text = text from which you want to extract the url
;   $__attributName = matching attribute name (can be class, id, name, etc...)

Func _GetLinkBackFrom($__text, $__attributName)
    Local $__pattern = '(?s).+\"' & $__attributName & '\".+?href="([^"]+).+'
    If StringRegExp($__text, $__pattern) Then
        Return StringRegExpReplace($__text, $__pattern, "$1")
    Else
        Return SetError(1, 0, '')
    EndIf
EndFunc
;===============================================================================================
;===============================================================================================

Link to comment
Share on other sites

excuse me, maybe I did not understand the question
but isn't easier to extract the links with this simple script?

if you have the source code in an html file and not on a web site,

you can always use the command _IECreate () pointing to that local file instead of a url:

#include <IE.au3>
Local $oIE = _IECreate(".\file.html") ; <--- path of local html file 
Local $oLinks = _IELinkGetCollection($oIE)
For $olink In $oLinks
    If StringInStr($olink.outerhtml, "productDetailLink") Then
        ConsoleWrite("href Info ===>" & $olink.href & @CRLF)
    EndIf
Next

bye

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

  • 1 year later...

I was wondering if it is possible to copy also a link text with autoit?

I use a Copy Link Text plugin in my chrome and in my firefox browser.

(right click on link --> "Copy Link Text")

Would be much easier if I could do it using autoit.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...