Jump to content

Recommended Posts

Posted

hello all!

have this code example:

href="/site/create/time=12">String</a><div class="bp"></div></div></div><div><h3 class="bq">main site </h3><ul class="br"><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/54769time=12">site2</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/64889time=12">site3</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/66595time=12">site4</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/31461time=12">site5</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li>

need get this:

/site/151919 site1
/site/54769 site2
/site/64889 site3
/site/66595 site4
/site/31461 site5

I tried with that code:

#Include <Array.au3>
#Include <String.au3>
#include <File.au3>
Local $Text = FileRead("example.txt")

Global $left1 = '<li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="'
Global $right1 = 'time=12">'

Global $left2 = 'time=12">'
Global $right2 = '</a></td>'

$source1 = _StringBetween($Text, $left1,$right1)
$source2 = _StringBetween($Text, $left2,$right2)

For $i = 0 To UBound($source1) - 1
        $source1[$i] = $source1[$i] & " " & $source2[$i]
Next

_FileWriteFromArray('output.txt', $source1)

But getting:

/site/151919 String</a><div class="bp"></div></div></div><div><h3 class="bq">main site </h3><ul class="br"><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1
/site/54769 site2
/site/64889 site3
/site/66595 site4
/site/31461 site5

also tried to use stringregexp, but it too hard for begiiner with bad english like me

any solutions?

Posted

Some example to work with...

#include <IE.au3>


_Example()
Func _Example()
    Local $sHTML = '<table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/54769time=12">site2</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/64889time=12">site3</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/66595time=12">site4</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/31461time=12">site5</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table>'

    Local $oIE = _IECreate()
    _IEDocWriteHTML($oIE,$sHTML)
    $oTags_coll = _IETagNameGetCollection($oIE,'a')

    For $oTag_enum In $oTags_coll
        ConsoleWrite($oTag_enum.href & ' ' & $oTag_enum.innertext & @CRLF )
    Next
EndFunc

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

like this:

#include <array.au3>
#include <StringConstants.au3>

_Example()
Func _Example()
    Local $sHTML = '<table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/54769time=12">site2</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/64889time=12">site3</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/66595time=12">site4</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/31461time=12">site5</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table>'
    Local $aHTML_Parsed = StringRegExp($sHTML, '(?i)<a.*?href="(.*?)">(.*?)<', $STR_REGEXPARRAYGLOBALFULLMATCH)
    If Not @error Then
        _ArrayDisplay($aHTML_Parsed, '$aHTML_Parsed')
        For $iOuter_idx = 0 To UBound($aHTML_Parsed) -1
            _ArrayDisplay($aHTML_Parsed[$iOuter_idx],'$aHTML_Parsed[$iOuter_idx]')
        Next
    EndIf
EndFunc   ;==>_Example

??

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted (edited)

it's so close, but how to write this 

/site/151919 site1
/site/54769 site2
/site/64889 site3
/site/66595 site4
/site/31461 site5

to file? Can't modify your source for my need

Edited by blackandwhite
Posted (edited)

hm....

#include <array.au3>
#include <StringConstants.au3>

_Example()
Func _Example()
    Local Enum $eFullMatch ,$eHref, $eText
    Local $sHTML = '<table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/54769time=12">site2</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/64889time=12">site3</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/66595time=12">site4</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/31461time=12">site5</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table>'
    Local $aInner
    Local $aHTML_Parsed = StringRegExp($sHTML, '(?i)<a.*?href="(.*?)">(.*?)<', $STR_REGEXPARRAYGLOBALFULLMATCH)
    If Not @error Then
        _ArrayDisplay($aHTML_Parsed, '$aHTML_Parsed')
        For $iOuter_idx = 0 To UBound($aHTML_Parsed) -1
            $aInner = $aHTML_Parsed[$iOuter_idx]
            ConsoleWrite($aInner[$eHref] & ' ' & $aInner[$eText] & ' ' & @CRLF)
        Next
    EndIf
EndFunc   ;==>_Example

Now you must only use:

FileOpen(...
FileWrite(...
FileClose(...

You should do it yourself as this is the easiest part of this job.

mLipok

Edited by mLipok

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Posted

Easier...

Local $sHTML = '<table class="n bs"><tbody><tr><td class="u"><a href="/site/151919time=12">site1</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/54769time=12">site2</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/64889time=12">site3</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/66595time=12">site4</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table></li><li class="bj"><table class="n bs"><tbody><tr><td class="u"><a href="/site/31461time=12">site5</a></td><td class="o"><span class="bt bu">0</span></td></tr></tbody></table>'

Local $s = StringRegExpReplace(StringRegExpReplace($sHTML, _ 
        '(?s).*?href="(.*?)time=\d+">(.*?)<', "$1  $2" & @crlf), '\R.*$', "")
Msgbox(0,"", $s)

 

Posted (edited)

For example, I have hundreds thousands of html pages with official notices of tenders, this is a packed 3 GB of data stored on my disc.
Analysis via IE is too slow...
 

Edited by mLipok
wording

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...