Jump to content
Mingre

[Again in conflict; post 21] StringRegExp help - on nested HTML tags

Recommended Posts

#include <Array.au3>
; Script Start - Add your code below here
Local $test = "<li>One<li>Inner<li>Innermost</li></li></li>" & _
            "<li>Two</li> "
$loob = StringRegExp($test, '\Q<li>\E(.*?)\Q</li>\E', 3)
_ArrayDisplay($loob, "How to return the One... and Two?")

Hello, can somebody help me:

(1) How can I have the regexp matched the two outermost bullets? Such that:
 

Quote

$array[0] = "One<li>Inner<li>Innermost</li></li>"

$array[1] = "Two"

(2) How can I match the "Innermost" bullet?

Thanks so much.

Edited by Mingre
Added the second question; added [SOLVED] on the title; changed title to not solved.

Share this post


Link to post
Share on other sites

Do you mean something like this:

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)

Note that this seemingly complex regexp is using an explicitely recursive pattern. Using named sub-patterns makes it more verbose but much clearer. The X (eXtended) option, allowing unescaped whitespaces to be unsignificant, also adds to readability. Refer to https://regex101.com/ for an english translation of the regexp semantics and debugging possibility. Also read up the official PCRE documentation for details on available constructs.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@jchd Thanks so much! That's exactly what I need tho I don't really understand that regexp :D Will try to learn it.

Again, thanks!

Share this post


Link to post
Share on other sites

Thanks also for the tip re: unescaped whitespaces. I was having a hard time reading through regexps because of the lack of spaces. :lol:

Share this post


Link to post
Share on other sites

Assuming that the regex engine works left to right couldn't something like this be enough ?

$data = StringRegExp($s, '(?s)<li>(.*?)</li>' , 3)

Edit
BTW jchd, thanks for this recursion example. This will make a nice cogitation for the next weekend  :)

Edited by mikell

Share this post


Link to post
Share on other sites

@mikell If it's done that way, the first encountered "</li>" from the left will be a match, which isn't exactly the pair of the outermost "<li>". :(

Share this post


Link to post
Share on other sites

@mikell, the issue isn't including or not the end markup, but the problem with nested <li>...</li> blocks. The naive "<li>(.*?)</li>" will anchor at the first <li> and match up to the first </li> after it, matching wrong colors:  <li> with  </li>

<li>One<li>Inner<li>Innermost</li>something else</li>more stuff</li>

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

what about:

#include<array.au3>

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

@jchd
Thanks, but I somewhat know that  :)
I thought that Mingre was interested in grabbing the content but not the end tags
This produces the result mentioned in post #1

$data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3)

Anyway I'll try to understand your recursive thing. For the moment there is a missing connection in my brain which makes me unable to understand it  :sweating:

Share this post


Link to post
Share on other sites
2 minutes ago, mikell said:

I thought that Mingre was interested in grabbing the content but not the end tags

I actually trimmed the ends after getting the content :lol:

Here's what I'm working with straight from SCiTe, hehe. Sorry for my messy coding style!

#include <Array.au3>
Local $s = "Lal<li>One<li>Inner<li hehe>Innermost</li></li><li>Inner 2<li>Innermost 2</li></li></li><li>Two</li>"
;GLobal $iRecursion = 0
;Local $s = "Innermost"
Local $ha[1][2]
hehe($ha, $s)
_ArrayDisplay($ha)

Func hehe(ByRef $array, $x, $iRecursion = -1)
    $iRecursion += 1
    Local Const $regEx_Start = "<li[^>]*?>"
    Local Const $start = '(?(DEFINE) (?<LiStart> ' & $regEx_Start & ' ) )'
    Local Const $regEx_End = "<\/li>"
    Local Const $end = '(?(DEFINE) (?<LiEnd>  ' & $regEx_End & '  ) )'
    Local $regex = _
            '(?imsx)' & _
            $start & _
            $end & _
            '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
            '(?&LiBlock)'
    Local $data = StringRegExp($x, $regex, 3)
    ;Consolewrite(@LF & 'x' & ' - ' & $x)
    Local $left, $wilLRecurse
    For $i = 0 To UBound($data) - 1 Step +1
        $data[$i] = StringRegExpReplace($data[$i], _
                '(?imsx)\A' & $regEx_Start & '(.*)' & $regEx_End & '\Z', '$1')

        $wilLRecurse = False
        If StringRegExp($data[$i], $regex) Then
            $wilLRecurse = True
            $left = $data[$i]
            $hi = StringRegExp($data[$i], '(?imsx)(\A.*?)(?:' & $regEx_Start & ')', 3)
            $data[$i] = $hi[0]
        EndIf


        ;_ArrayDisplay($left)
        _ArrayAdd($array, $iRecursion)
        $array[UBound($array) -1][1] = $data[$i]
        ConsoleWrite(@LF & $iRecursion & ' - ' & $data[$i])
        If $wilLRecurse Then hehe($array, $left, $iRecursion)

        ;EndIf
    Next
    ;
    $iRecursion -= 1
    ;_ArrayDisplay($data, $x)
    Return $data
EndFunc   ;==>hehe

 

Share this post


Link to post
Share on other sites

@mikell

Its structure is quite similar to this one. But still your last example doesn't do the job in the general case. See the difference:

Local $s = "<li>One<li>Inner<li>Innermost</li>Rha ... lovely!</li>oops</li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)
$data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3)
_ArrayDisplay($data)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
13 minutes ago, iamtheky said:

what about:

#include<array.au3>

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 

I don't know proper HTML but sometimes there are intervening texts between two "</li>". :(

#include<array.au3>

Local $s = "<li><b>One<li>Inner<li>Innermost</li></li></b></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 

Share this post


Link to post
Share on other sites

Exactly why I pointed that detail out.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Of course it's up to you to rewrite it with numbered references. It's a bit faster to parse (few µs), but much more confusing. I find named patterns very useful when a complex regex has to break down several similar structures.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@jchd and @mikell : I don't understand how it's possible to write JC's code with numbered subpatterns ... For me each number of subpattern correspond to a captured group : it's not the case with defined subroutines which are not captured. (?1) refers to the first capturing group, so how can it be done tp refer to a non capturing group ? Is it possible ? (I search for a long time, so if you have an answer, I would be grateful to you !)
...or the StringRegExp result will have more results that JC's "defined-subroutine" way.

By the way, JC, your beautiful regex is not so hard to decorticate, but it really needs an extra-evolved brain to build something like it.

Share this post


Link to post
Share on other sites

You want variants? Okay.

Local $s = "<li>One<li>Inner<li>Innermost</li>rhagnagna</li>gloups</li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)

$regex = _
    '(?imsx)' & _
    '(?<LiBlock>' & _
    '   <li>' & _
    '   (?: (?&LiBlock)* | .*? )*' & _
    '   <\/li>' & _
    ')'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = _
    '(?imsx)' & _
    '(' & _
    '   <li>' & _
    '   (?: (?-1)* | .*? )*' & _
    '   <\/li>' & _
    ')'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?-1)*|.*?)*<\/li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I could have added a couple (English "couple" often means 3) of shorter versions:

$regex = '(?ims)(<li>(?:(?1)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?0)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?R)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

Note that all of this and above is only a series of semantic cosmetic rewrites, the structure and working are exactly the same.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • By BlueBandana
      Is there a way to output the regex matches into a file?
      I have a script to compare two files and check for regex matches.
      I want to output the matching regex of 'testexample.txt' to another file.
      #include <MsgBoxConstants.au3> #include <Array.au3> $Read = FileReadToArray("C:\Users\admin\Documents\testexample.txt") $Dictionary = FileReadToArray("C:\Users\admin\Documents\example.txt") For $p = 0 To UBound($Dictionary) - 1 Step 1 $pattern = $Dictionary[$p] For $i = 0 To UBound($Read) - 1 Step 1 $regex = $Read[$i] If StringRegExp($regex, $pattern, 0) Then MsgBox(0, "ResultsPass", "The string is in the file, highlighted strings: " ) Else MsgBox(0, "ResultsFail", "The string isn't in the file.") EndIf Next Next  
    • By junichironakashima
      Im creating a code that will work in this sequence:
      1. Copy the text (question) in one atea of the screen
      2. Catch the 2 strings (number)
      3. Multiply the 2 strings ( $1*$2)
      4. Click the next area to put the answer
      5. Paste the answer
       
      This is my code
       
      MouseClick($MOUSE_CLICK_LEFT, 479, 802, 3, 1) ;Clicking all of the text
      Send("^c") 
      $x = StringRegExpReplace(ClipGet(), 'What is (\d*) x (\d*) \?$', "$1*$2")
      MouseClick($MOUSE_CLICK_LEFT, 480, 844, 1, 1)
      ClipPut($x)
      Send("^v")
       
      However the output is this
      $1*$2
       
      How can I make it solve itself? Because I tried this code:
      MouseClick($MOUSE_CLICK_LEFT, 479, 802, 3, 1) ;Clicking all of the text
      Send("^c")
      MouseClick($MOUSE_CLICK_LEFT, 480, 844, 1, 1) $x = Execute(StringRegExpReplace(ClipGet(), 'What is (\d*) x (\d*) \?$', "$1*$2"))
      ClipPut($x)
      Send("^v")
      Output is just blank text

    • By mLipok
      In April 5, 2013 I ask @Lazycat 
      he answer:
      Then I change this tool a little.
      Now I back to this and make bigger changed.
      Here is new version.
      Update History: = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/07 v3.0 * Changed: AU3Check compilant - mLipok * Changed: almost all Variables renamed - mLipok * Added: "Delete RegExp Results" - mLipok * Added: support for dual monitor - mLipok * Added: "full screen mode" - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/08 v3.1 * Added: colors for each Edit control - used GUICtrlSetBkColor() - mLipok * Added: FullScreen option (Checkbox + INI + Remarks in Tip) - mLipok * Added: _IsChecked() - mLipok * Changed: WinMove() - change size of window using: WindowWidth and WindowHeight - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/13 v3.2 * Added: If $bFullScreen Then GUICtrlSetFont() - mLipok * Added: WM_COMMAND , $EN_CHANGE - prevent CPU overheat - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 2018/11/29 v3.3 * Changed: $_g_idCheckbox_Clear - also clear $_g_idEdit_Result - mLipok * Changed: ClearResult If GUICtrlRead($_g_idEdit_MatchText) = '' Or GUICtrlRead($_g_idEdit_MatchText) = '' - mLipok * Fixed: prevention CPU overheat - If $iGuiMsg <> 0 Then $_g_bWasAChange = True - any GUI change will fire RegExp result refresh - mLipok * Fixed: Top possition of $_g_idLabel_Dummy control - mLipok * Added: support for TabSwitch - CTRL+TAB and CTRL+SHIFT+TAB - mLipok = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =  
    • By MrCheese
      argh, pulling my hair out.
      considering this post: 
       
      say for a string = "03a", how can I strip out the leading 0 and the a.
      I have tried:
      $new = StringRegExpReplace($string, '[^1-9][^0-9]', '')
       
      and various combinations:
      ^0+[^0-9]
      [^[:digit:]]
      "[^0].*"
      "^0*(d+)"
       
      I'm going loopy!
       
       
    • By PClough
      Hi everyone!
      After updating autoit, I tried to run an old program using complex regexp's.  It did not work.  Eventually I broke the problem down to this example:
       
      #include <Array.au3> $buf = "First title" & @CRLF & "Tom" & Chr(0x92) & "s sleepwalking" & @CRLF & "Last | line" & @CRLF $items = StringRegExp($buf, '([\x20-\xff]+)\x0d\x0a', 3) _ArrayDisplay($items,'') And this is the result I get when running it:
      Row 0
       
×
×
  • Create New...