Jump to content

How to use StringRegExp() function to get text running across multiple lines between html tags


Recommended Posts

Hi Team,

I wanted to get text running across multiple lines between html tags.

I have used the following code :

********************************************************

$fileForRead = FileOpen("test.html", 0)

; Check if file opened for reading OK

If $fileForRead = -1 Then

MsgBox(0, "Error", "Unable to open file test.html.")

Exit

EndIf

; Read in full character at a time until the EOF is reached

$sText = FileRead($fileForRead )

FileClose($fileForRead)

$subString = StringRegExp($sText,'<div id="likely_problems">(.*?)</div>', 1)

ConsoleWrite($subString[0])

*******************************************************

The text file I use as input is

***********************************************************

abcdefghijk

<div id="likely_problems"><span class='congrats_msg'><img src='http://test.com/images/feedback.gif' alt='Feedback' /> Congratulations! No likely problems.</span> ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

</div>

abcdefghijk

**************************************************************

It is not working and throwing the following error : "D:\AccesibilityTesting\regExpsimple.au3 (15) : ==> Subscript used with non-Array variable.:"

The pattern matching is working if input file has <div id="likely_problems"> and </div> in "same line".

How can I get a pattern which is over multiple lines (Ignoring white space & newline)?

Thanks,

Thomas

Link to comment
Share on other sites

$nOffset = 1
While 1
    $array = StringRegExp('<div id="likely_problems"><span class="congrats_msg"><img src="http://test.com/images/feedback.gif" alt="Feedback"/> Congratulations! No likely problems.</div>', '<(?i)div id="likely_problems">(.*?)</(?i)div>', 1, $nOffset)
    
    If @error = 0 Then
        $nOffset = @extended
    Else
        ExitLoop
    EndIf
    for $i = 0 to UBound($array) - 1
        msgbox(0, "RegExp Test with Option 1 - " & $i, $array[$i])
    Next
WEnd

Keep in mind the " and ' characters from the html file.Put a line like:

msgbox(0,"",$sText)
to see your string before you feed it to the StringRegExp.
Link to comment
Share on other sites

You didn't really say what portion of that code you wanted returned. I'll assume it's everything within the div tags.

$sText = FileRead("test.html") ;; No need to open a file in read mode to use FileRead()
If $sText Then
    $subString = StringRegExp($sText, "(?i)(?s)<div id=.?likely_.+?>(.+?)\s*</div>", 1)
    If NOT @Error Then
        ConsoleWrite($subString[0]) & @CRLF
        ;;  To clean it up even further use this line
        $sCleaned = StringRegExpReplace($subString[0], "<.+?>", "")
        If @Extended Then ConsoleWrite($sCleaned) & @CRLF
    EndIf
EndIf

Look in my signature for a toolkit to test AutoIt PCRE expressions.

EDIT: $sSubString[0] should contain

<span class='congrats_msg'><img src='http://test.com/images/feedback.gif' alt='Feedback' /> Congratulations! No likely problems.</span> ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

After cleaning that would become

Congratulations! No likely problems. ukjkjghjka

ksadfhjhagk;

jadghkl;aosdjl

aklgoruiouporip

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

You didn't really say what portion of that code you wanted returned. I'll assume it's everything within the div tags.

$sText = FileRead("test.html") ;; No need to open a file in read mode to use FileRead()
If $sText Then
    $subString = StringRegExp($sText, "(?i)(?s)<div id=.?likely_.+?>(.+?)\s*</div>", 1)
    If NOT @Error Then
        ConsoleWrite($subString[0]) & @CRLF
        ;;  To clean it up even further use this line
        $sCleaned = StringRegExpReplace($subString[0], "<.+?>", "")
        If @Extended Then ConsoleWrite($sCleaned) & @CRLF
    EndIf
EndIf

Look in my signature for a toolkit to test AutoIt PCRE expressions.

EDIT: $sSubString[0] should contain

After cleaning that would become

Thank you. It is working directly.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...