Jump to content

Recommended Posts

Posted

Example:

<A href="showthread.php?s=61997dac20687aa843bddf8b3a45e836&p=20495#post20495" rel=nofollow><IMG class=inlineimg title="View" alt="View" src="phuot_images/buttons/viewpost-right.png"></A>

How to get this link (or All type of URL)

"showthread.php?s=61997dac20687aa843bddf8b3a45e836&amp;p=20495#post20495"

 

Posted

Please post little wider HTML snippet ( i.e. also few tags before <a....)

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

Because I want to get the text of website, but keep some URL or Image link in this website. So I think I must do with each TAG HTML (from _IEBodyReadHTML) and remove the unnecessary in this

Posted

Try to enumerate A Tag.

Local $oATags_coll = _IETagNameGetCollection($oIE,'a')
For $oATag_enum in $oATags_coll
    ConsoleWrite($oATag_enum.href & @CRLF)
Next

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

a simple _StringBetween (using RegEx intern) does the same job:

#include <String.au3>
#include <Array.au3>

$sHTML='<A href="showthread.php?s=61997dac20687aa843bddf8b3a45e836&p=20495#post20495" rel=nofollow><IMG class=inlineimg title="View" alt="View" src="phuot_images/buttons/viewpost-right.png"></A>'
$sHTML&='<A href="showthread.php?s=61997dac20687aa843bddf8b3a45e836&p=20496#post20496" rel=nofollow><IMG class=inlineimg title="View" alt="View" src="phuot_images/buttons/viewpost-right.png"></A>'

$aLinks=_StringBetween($sHTML,'href="','"')
_ArrayDisplay($aLinks)

 

Posted

Thanks all!!

I have done this, with the function _ConvertHTMLTag on our forum :D

Func _RemoveIMG_URLTag($Data)
   local $Result[1][3] = [["","",""]]
   $SearchIMG = '(?i)(href|src)="([^"]+)'
   $Data = StringRegExpReplace($Data, '(?i)Border="0"', 'border=0', 0)
   $Data = StringRegExpReplace($Data, '(?i)<(DIV|/DIV)>', @CRLF, 0)
   $DataSP = StringReplace($Data, '<BR>', @CRLF)
   $DataSP = StringSplit($DataSP, @CRLF)
   ;_ArrayDisplay($DataSP, 'Before')
         While 1
            $LineIMG = _ArraySearch($DataSP, '<img ' , 0, 0, 0, 1)
            if @error then ExitLoop
            ;MsgBox(0, 'LineIMG - before', $DataSP[$LineIMG])
            if StringRegExp($DataSP[$LineIMG], "(?i) Border=0 ", 0) = 1 AND StringRegExp($DataSP[$LineIMG], '(?i) alt=""', 0) = 1 AND StringRegExp($DataSP[$LineIMG], '(?i) class=', 0) = 0 Then
               $DataRemoveIMGTag = StringRegExp($DataSP[$LineIMG], $SearchIMG, 4)
               local $Result[1][3] = [["","",""]]
               for $a1 =0 to UBound($DataRemoveIMGTag) - 1
                  _ArrayAdd($Result, "")
                  $Result[$a1+1][0] = ($DataRemoveIMGTag[$a1])[0]
                  $Result[$a1+1][1] = ($DataRemoveIMGTag[$a1])[1]
                  $Result[$a1+1][2] = ($DataRemoveIMGTag[$a1])[2]
               Next
               _ArrayDelete($Result, 0)
               ;_ArrayDisplay($Result, 'result')
               $DataSP[$LineIMG] = ''
               for $a2 = 0 to UBound($Result) - 1
                  if StringInStr($Result[$a2][1], 'href', 1) then $Result[$a2][2] = '[URL] ' & $Result[$a2][2] & ' [/URL]'
                  if StringInStr($Result[$a2][1], 'src', 1) then $Result[$a2][2] = '[IMG] ' & $Result[$a2][2] & ' [/IMG]'
                  $DataSP[$LineIMG] &= $Result[$a2][2] & @CRLF
               Next
            Else
               $DataSP[$LineIMG] = _ConvertHTMLTag($DataSP[$LineIMG])
            EndIf
            ;MsgBox(0, 'LineIMG - after', $DataSP[$LineIMG])
         WEnd

  ; _ArrayDisplay($DataSP, 'After')
   $DataAll = ''
   for $c = 1 to UBound($DataSP) - 1
      $DataAll &= $DataSP[$c] & '<BR>'
   Next
   return $DataAll
EndFunc

Func _ConvertHTMLTag($Text)
   $text = StringRegExpReplace($text, '(?si)<head>.*?</head>', '')
   $text = StringRegExpReplace($text, '(?si)<script[^>]*?>.*?</script>', '')
   $text = StringRegExpReplace($text, '>\s+<', '> <')
   ; $text = StringRegExpReplace($text, '[\v]', '')
   $text = StringRegExpReplace($text, '<(br|BR|p)>', @CRLF)
   $text = StringRegExpReplace($text, '<(DIV|div)>', @CRLF)
   $text = StringRegExpReplace($text, '<[/!]*?[^<>]*?>', '')
   ; $text = StringRegExpReplace($text, '<[\/\!]*?[^<>]*?>', @CRLF)
   $text = StringReplace($text, '&quot;', '"')
   $text = StringReplace($text, '&amp;', '&')
   $text = StringReplace($text, '&lt;', '<')
   $text = StringReplace($text, '&gt;', '>')
   $text = StringReplace($text, '&nbsp;', ' ')
   $text = StringReplace($text, '&iexcl;', '&#161;')
   $text = StringReplace($text, '&cent;', '&#162;')
   $text = StringReplace($text, '&pound;', '&#163;')
   $text = StringReplace($text, '&copy;', '&#169;')

   ; &#444 -> character
   $a = StringRegExp($text, '&#(\d+);', 3)
   If Not @error Then
       ; $log &= UBound($a) & '   &#(\d+);' & @CRLF
       $a = _ArrayUnique($a)
       For $i = 1 To $a[0]
           $a[$i] = Number($a[$i])
       Next
       _ArraySort($a, 1, 1)
       ;_ArrayDisplay($a)
       For $i = 1 To $a[0]
           $text = StringReplace($text, '&#' & $a[$i] & ';', ChrW($a[$i]))
       Next
   EndIf
   $text = StringRegExpReplace($text, '([\r\n])[\s]+', @CRLF)
   return $text
EndFunc

 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...