Jump to content
Sign in to follow this  
Mingre

[Again in conflict; post 21] StringRegExp help - on nested HTML tags

Recommended Posts

#include <Array.au3>
; Script Start - Add your code below here
Local $test = "<li>One<li>Inner<li>Innermost</li></li></li>" & _
            "<li>Two</li> "
$loob = StringRegExp($test, '\Q<li>\E(.*?)\Q</li>\E', 3)
_ArrayDisplay($loob, "How to return the One... and Two?")

Hello, can somebody help me:

(1) How can I have the regexp matched the two outermost bullets? Such that:
 

Quote

$array[0] = "One<li>Inner<li>Innermost</li></li>"

$array[1] = "Two"

(2) How can I match the "Innermost" bullet?

Thanks so much.

Edited by Mingre
Added the second question; added [SOLVED] on the title; changed title to not solved.

Share this post


Link to post
Share on other sites

Do you mean something like this:

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)

Note that this seemingly complex regexp is using an explicitely recursive pattern. Using named sub-patterns makes it more verbose but much clearer. The X (eXtended) option, allowing unescaped whitespaces to be unsignificant, also adds to readability. Refer to https://regex101.com/ for an english translation of the regexp semantics and debugging possibility. Also read up the official PCRE documentation for details on available constructs.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@jchd Thanks so much! That's exactly what I need tho I don't really understand that regexp :D Will try to learn it.

Again, thanks!

Share this post


Link to post
Share on other sites

Thanks also for the tip re: unescaped whitespaces. I was having a hard time reading through regexps because of the lack of spaces. :lol:

Share this post


Link to post
Share on other sites

Assuming that the regex engine works left to right couldn't something like this be enough ?

$data = StringRegExp($s, '(?s)<li>(.*?)</li>' , 3)

Edit
BTW jchd, thanks for this recursion example. This will make a nice cogitation for the next weekend  :)

Edited by mikell

Share this post


Link to post
Share on other sites

@mikell If it's done that way, the first encountered "</li>" from the left will be a match, which isn't exactly the pair of the outermost "<li>". :(

Share this post


Link to post
Share on other sites

@mikell, the issue isn't including or not the end markup, but the problem with nested <li>...</li> blocks. The naive "<li>(.*?)</li>" will anchor at the first <li> and match up to the first </li> after it, matching wrong colors:  <li> with  </li>

<li>One<li>Inner<li>Innermost</li>something else</li>more stuff</li>

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

what about:

#include<array.au3>

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

@jchd
Thanks, but I somewhat know that  :)
I thought that Mingre was interested in grabbing the content but not the end tags
This produces the result mentioned in post #1

$data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3)

Anyway I'll try to understand your recursive thing. For the moment there is a missing connection in my brain which makes me unable to understand it  :sweating:

Share this post


Link to post
Share on other sites
2 minutes ago, mikell said:

I thought that Mingre was interested in grabbing the content but not the end tags

I actually trimmed the ends after getting the content :lol:

Here's what I'm working with straight from SCiTe, hehe. Sorry for my messy coding style!

#include <Array.au3>
Local $s = "Lal<li>One<li>Inner<li hehe>Innermost</li></li><li>Inner 2<li>Innermost 2</li></li></li><li>Two</li>"
;GLobal $iRecursion = 0
;Local $s = "Innermost"
Local $ha[1][2]
hehe($ha, $s)
_ArrayDisplay($ha)

Func hehe(ByRef $array, $x, $iRecursion = -1)
    $iRecursion += 1
    Local Const $regEx_Start = "<li[^>]*?>"
    Local Const $start = '(?(DEFINE) (?<LiStart> ' & $regEx_Start & ' ) )'
    Local Const $regEx_End = "<\/li>"
    Local Const $end = '(?(DEFINE) (?<LiEnd>  ' & $regEx_End & '  ) )'
    Local $regex = _
            '(?imsx)' & _
            $start & _
            $end & _
            '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
            '(?&LiBlock)'
    Local $data = StringRegExp($x, $regex, 3)
    ;Consolewrite(@LF & 'x' & ' - ' & $x)
    Local $left, $wilLRecurse
    For $i = 0 To UBound($data) - 1 Step +1
        $data[$i] = StringRegExpReplace($data[$i], _
                '(?imsx)\A' & $regEx_Start & '(.*)' & $regEx_End & '\Z', '$1')

        $wilLRecurse = False
        If StringRegExp($data[$i], $regex) Then
            $wilLRecurse = True
            $left = $data[$i]
            $hi = StringRegExp($data[$i], '(?imsx)(\A.*?)(?:' & $regEx_Start & ')', 3)
            $data[$i] = $hi[0]
        EndIf


        ;_ArrayDisplay($left)
        _ArrayAdd($array, $iRecursion)
        $array[UBound($array) -1][1] = $data[$i]
        ConsoleWrite(@LF & $iRecursion & ' - ' & $data[$i])
        If $wilLRecurse Then hehe($array, $left, $iRecursion)

        ;EndIf
    Next
    ;
    $iRecursion -= 1
    ;_ArrayDisplay($data, $x)
    Return $data
EndFunc   ;==>hehe

 

Share this post


Link to post
Share on other sites

@mikell

Its structure is quite similar to this one. But still your last example doesn't do the job in the general case. See the difference:

Local $s = "<li>One<li>Inner<li>Innermost</li>Rha ... lovely!</li>oops</li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)
$data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3)
_ArrayDisplay($data)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
13 minutes ago, iamtheky said:

what about:

#include<array.au3>

Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 

I don't know proper HTML but sometimes there are intervening texts between two "</li>". :(

#include<array.au3>

Local $s = "<li><b>One<li>Inner<li>Innermost</li></li></b></li><li>Two</li>"

_ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3))

 

Share this post


Link to post
Share on other sites

Exactly why I pointed that detail out.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Of course it's up to you to rewrite it with numbered references. It's a bit faster to parse (few µs), but much more confusing. I find named patterns very useful when a complex regex has to break down several similar structures.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@jchd and @mikell : I don't understand how it's possible to write JC's code with numbered subpatterns ... For me each number of subpattern correspond to a captured group : it's not the case with defined subroutines which are not captured. (?1) refers to the first capturing group, so how can it be done tp refer to a non capturing group ? Is it possible ? (I search for a long time, so if you have an answer, I would be grateful to you !)
...or the StringRegExp result will have more results that JC's "defined-subroutine" way.

By the way, JC, your beautiful regex is not so hard to decorticate, but it really needs an extra-evolved brain to build something like it.

Share this post


Link to post
Share on other sites

You want variants? Okay.

Local $s = "<li>One<li>Inner<li>Innermost</li>rhagnagna</li>gloups</li><li>Two</li>"
Local $regex = _
    '(?imsx)' & _
    '(?(DEFINE) (?<LiStart> <li>  ) )' & _
    '(?(DEFINE) (?<LiEnd>   <\/li> ) )' & _
    '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _
    '(?&LiBlock)'
$data = StringRegExp($s, $regex, 3)
_ArrayDisplay($data)

$regex = _
    '(?imsx)' & _
    '(?<LiBlock>' & _
    '   <li>' & _
    '   (?: (?&LiBlock)* | .*? )*' & _
    '   <\/li>' & _
    ')'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = _
    '(?imsx)' & _
    '(' & _
    '   <li>' & _
    '   (?: (?-1)* | .*? )*' & _
    '   <\/li>' & _
    ')'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?-1)*|.*?)*<\/li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I could have added a couple (English "couple" often means 3) of shorter versions:

$regex = '(?ims)(<li>(?:(?1)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?0)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

$regex = '(?ims)(<li>(?:(?R)*|.*?)*</li>)'
$data = StringRegExp($s, $regex , 3)
_ArrayDisplay($data)

Note that all of this and above is only a series of semantic cosmetic rewrites, the structure and working are exactly the same.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Tosyk
      Hi,
      Please help me to change metasymbol line. Right now I have this condition code:
      If StringInStr($_sName, 'TEXT ') Then $_sName = StringRegExpReplace($_sName, '(^.*)\TEXT (.*)$', '$2') $_sName = StringRegExpReplace($_sName, '(^.*)\ (.*)$', '$1') If Not CheckIsSave_($_sName) Then It work fine with this text file and finds each line which start from 'TEXT':
      Material B7E671143D244B ==================================== TEXT 2F3139D816C34D 1 TEXT B6A968EF2505A2 1 TEXT 35206697A04F91 1 TEXT EB485AF490D83D 1 TEXT 0DAB42294BD9B3 1 TEXT 3D6525BEE360E1 0 Material D6906B886B06E3 ==================================== TEXT 0CCECCCCFB62AE 1 TEXT 1E14CB29AB43F0 1 TEXT FB7F0DCE9B5950 1 But I have a new text file now the lines of which now are start with 0:, 1: and so on:
      sm_0 --------------- 0: dummy_gray 1: c_com_socksa_mt 2: c_com_socksa_tn 3: dummy_white 4: default_z 5: dummy_nmap 6: --- 7: --- sm_1 --------------- 0: c_com_prisoner_shoes_di 1: c_com_prisoner_shoes_mt 2: c_com_prisoner_shoes_tn 3: dummy_white 4: default_z 5: c_com_leatherb_rt 6: --- 7: --- how to change (or add) the condition code above to work with new text file?
      I'm trying to change this script: http://autoit-script.ru/threads/poisk-fajlov-rekursivno-po-dannomu-spisku.26970/post-148646
       
    • By seadoggie01
      I'm trying to capture everything after a "#ToDo" in my scripts. I got that like this:
      (?i)[^\v]*#todo(.*) But then I thought it would be nice to use underscores to continue the ToDo... kind of like this:
      #ToDo: This is a really long explanation about something _ # that is very in-depth and needs to take up a lot of _ # space in a ToDo comment Global $variables = "Bad" I can't seem to capture everything... and maybe I'm trying to do too much with Regex... I keep trying variations of this:
      Condensed Version: (?im)[^\v]*#todo(?:([^\v]*)_\s*)*#([^\v]*) Expanded with comments (?ixm)(?# Ignore case, ignore newlines in Regex, use multiline option)# [^\v]*(?# Match leading space/s)# \#todo(?# Match the #todo)# (?:([^\v]*)_\s*)*(?# Match lines ending with _)# \#([^\v]*)(?# Last line only, no _'s)# I never seem to be able to build an array well with Regex... I saw something once about not being able to capture repeated patterns, and I think that's my issue
    • By genius257
      Inspired by PHP's preg_split.
      Split string by a regular expression.
      Also supports the same flags as the PHP equivalent.
      v1.0.1
       
      Example:
      #include "StringRegExpSplit.au3" StringRegExpSplit('splitCamelCaseWords', '(?<=\w)(?=[A-Z])') ; ['split', 'Camel', 'Case', 'Words']  
    • By RAMzor
      Hi guys I need your help.
      I have string like this : "TDM111A5,      RCT222Y5/ 7  ; FDT444E4 /8 , ABC222R5"
      I need find a coma or semicolon and delete white spaces before and after them
      The output should be a string and/or array 
      String : "TDM111A5,RCT222Y5/ 7;FDT444E4 /8,ABC222R5"
      Array:
      TDM111A5
      RCT222Y5/ 7
      FDT444E4 /8
      ABC222R5
    • By BlueBandana
      Is there a way to output the regex matches into a file?
      I have a script to compare two files and check for regex matches.
      I want to output the matching regex of 'testexample.txt' to another file.
      #include <MsgBoxConstants.au3> #include <Array.au3> $Read = FileReadToArray("C:\Users\admin\Documents\testexample.txt") $Dictionary = FileReadToArray("C:\Users\admin\Documents\example.txt") For $p = 0 To UBound($Dictionary) - 1 Step 1 $pattern = $Dictionary[$p] For $i = 0 To UBound($Read) - 1 Step 1 $regex = $Read[$i] If StringRegExp($regex, $pattern, 0) Then MsgBox(0, "ResultsPass", "The string is in the file, highlighted strings: " ) Else MsgBox(0, "ResultsFail", "The string isn't in the file.") EndIf Next Next  
×
×
  • Create New...