Sign in to follow this  
Followers 0
Pachan

How to use StringRegExp() function to get text running across multiple lines between html tags

4 posts in this topic

Hi Team,

I wanted to get text running across multiple lines between html tags.

I have used the following code :

********************************************************

$fileForRead = FileOpen("test.html", 0)

; Check if file opened for reading OK

If $fileForRead = -1 Then

MsgBox(0, "Error", "Unable to open file test.html.")

Exit

EndIf

; Read in full character at a time until the EOF is reached

$sText = FileRead($fileForRead )

FileClose($fileForRead)

$subString = StringRegExp($sText,'<div id="likely_problems">(.*?)</div>', 1)

ConsoleWrite($subString[0])

*******************************************************

The text file I use as input is

***********************************************************

abcdefghijk

<div id="likely_problems"><span class='congrats_msg'><img src='http://test.com/images/feedback.gif' alt='Feedback' /> Congratulations! No likely problems.</span> ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

</div>

abcdefghijk

**************************************************************

It is not working and throwing the following error : "D:\AccesibilityTesting\regExpsimple.au3 (15) : ==> Subscript used with non-Array variable.:"

The pattern matching is working if input file has <div id="likely_problems"> and </div> in "same line".

How can I get a pattern which is over multiple lines (Ignoring white space & newline)?

Thanks,

Thomas

Share this post


Link to post
Share on other sites



$nOffset = 1
While 1
    $array = StringRegExp('<div id="likely_problems"><span class="congrats_msg"><img src="http://test.com/images/feedback.gif" alt="Feedback"/> Congratulations! No likely problems.</div>', '<(?i)div id="likely_problems">(.*?)</(?i)div>', 1, $nOffset)
    
    If @error = 0 Then
        $nOffset = @extended
    Else
        ExitLoop
    EndIf
    for $i = 0 to UBound($array) - 1
        msgbox(0, "RegExp Test with Option 1 - " & $i, $array[$i])
    Next
WEnd

Keep in mind the " and ' characters from the html file.Put a line like:

msgbox(0,"",$sText)
to see your string before you feed it to the StringRegExp.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

You didn't really say what portion of that code you wanted returned. I'll assume it's everything within the div tags.

$sText = FileRead("test.html") ;; No need to open a file in read mode to use FileRead()
If $sText Then
    $subString = StringRegExp($sText, "(?i)(?s)<div id=.?likely_.+?>(.+?)\s*</div>", 1)
    If NOT @Error Then
        ConsoleWrite($subString[0]) & @CRLF
        ;;  To clean it up even further use this line
        $sCleaned = StringRegExpReplace($subString[0], "<.+?>", "")
        If @Extended Then ConsoleWrite($sCleaned) & @CRLF
    EndIf
EndIf

Look in my signature for a toolkit to test AutoIt PCRE expressions.

EDIT: $sSubString[0] should contain

<span class='congrats_msg'><img src='http://test.com/images/feedback.gif' alt='Feedback' /> Congratulations! No likely problems.</span> ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

After cleaning that would become

Congratulations! No likely problems. ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

You didn't really say what portion of that code you wanted returned. I'll assume it's everything within the div tags.

$sText = FileRead("test.html") ;; No need to open a file in read mode to use FileRead()
If $sText Then
    $subString = StringRegExp($sText, "(?i)(?s)<div id=.?likely_.+?>(.+?)\s*</div>", 1)
    If NOT @Error Then
        ConsoleWrite($subString[0]) & @CRLF
        ;;  To clean it up even further use this line
        $sCleaned = StringRegExpReplace($subString[0], "<.+?>", "")
        If @Extended Then ConsoleWrite($sCleaned) & @CRLF
    EndIf
EndIf

Look in my signature for a toolkit to test AutoIt PCRE expressions.

EDIT: $sSubString[0] should contain

After cleaning that would become

Thank you. It is working directly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0