Mingre Posted July 28, 2016 Share Posted July 28, 2016 (edited) #include <Array.au3> ; Script Start - Add your code below here Local $test = "<li>One<li>Inner<li>Innermost</li></li></li>" & _ "<li>Two</li> " $loob = StringRegExp($test, '\Q<li>\E(.*?)\Q</li>\E', 3) _ArrayDisplay($loob, "How to return the One... and Two?") Hello, can somebody help me: (1) How can I have the regexp matched the two outermost bullets? Such that: Quote $array[0] = "One<li>Inner<li>Innermost</li></li>" $array[1] = "Two" (2) How can I match the "Innermost" bullet? Thanks so much. Edited July 29, 2016 by Mingre Added the second question; added [SOLVED] on the title; changed title to not solved. Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 (edited) Do you mean something like this: Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>" Local $regex = _ '(?imsx)' & _ '(?(DEFINE) (?<LiStart> <li> ) )' & _ '(?(DEFINE) (?<LiEnd> <\/li> ) )' & _ '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _ '(?&LiBlock)' $data = StringRegExp($s, $regex, 3) _ArrayDisplay($data) Note that this seemingly complex regexp is using an explicitely recursive pattern. Using named sub-patterns makes it more verbose but much clearer. The X (eXtended) option, allowing unescaped whitespaces to be unsignificant, also adds to readability. Refer to https://regex101.com/ for an english translation of the regexp semantics and debugging possibility. Also read up the official PCRE documentation for details on available constructs. Edited July 28, 2016 by jchd Skysnake, jguinch and Mingre 3 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Mingre Posted July 28, 2016 Author Share Posted July 28, 2016 @jchd Thanks so much! That's exactly what I need tho I don't really understand that regexp Will try to learn it. Again, thanks! Link to comment Share on other sites More sharing options...
Mingre Posted July 28, 2016 Author Share Posted July 28, 2016 Thanks also for the tip re: unescaped whitespaces. I was having a hard time reading through regexps because of the lack of spaces. Link to comment Share on other sites More sharing options...
mikell Posted July 28, 2016 Share Posted July 28, 2016 (edited) Assuming that the regex engine works left to right couldn't something like this be enough ? $data = StringRegExp($s, '(?s)<li>(.*?)</li>' , 3) Edit BTW jchd, thanks for this recursion example. This will make a nice cogitation for the next weekend Edited July 28, 2016 by mikell Link to comment Share on other sites More sharing options...
Mingre Posted July 28, 2016 Author Share Posted July 28, 2016 @mikell If it's done that way, the first encountered "</li>" from the left will be a match, which isn't exactly the pair of the outermost "<li>". Link to comment Share on other sites More sharing options...
mikell Posted July 28, 2016 Share Posted July 28, 2016 OK, sorry. I didn't think that including the "</li>" in the captured match was something important Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 (edited) @mikell, the issue isn't including or not the end markup, but the problem with nested <li>...</li> blocks. The naive "<li>(.*?)</li>" will anchor at the first <li> and match up to the first </li> after it, matching wrong colors: <li> with </li> <li>One<li>Inner<li>Innermost</li>something else</li>more stuff</li> Edited July 28, 2016 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
iamtheky Posted July 28, 2016 Share Posted July 28, 2016 what about: #include<array.au3> Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>" _ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3)) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
mikell Posted July 28, 2016 Share Posted July 28, 2016 @jchd Thanks, but I somewhat know that I thought that Mingre was interested in grabbing the content but not the end tags This produces the result mentioned in post #1 $data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3) Anyway I'll try to understand your recursive thing. For the moment there is a missing connection in my brain which makes me unable to understand it Link to comment Share on other sites More sharing options...
Mingre Posted July 28, 2016 Author Share Posted July 28, 2016 2 minutes ago, mikell said: I thought that Mingre was interested in grabbing the content but not the end tags I actually trimmed the ends after getting the content Here's what I'm working with straight from SCiTe, hehe. Sorry for my messy coding style! expandcollapse popup#include <Array.au3> Local $s = "Lal<li>One<li>Inner<li hehe>Innermost</li></li><li>Inner 2<li>Innermost 2</li></li></li><li>Two</li>" ;GLobal $iRecursion = 0 ;Local $s = "Innermost" Local $ha[1][2] hehe($ha, $s) _ArrayDisplay($ha) Func hehe(ByRef $array, $x, $iRecursion = -1) $iRecursion += 1 Local Const $regEx_Start = "<li[^>]*?>" Local Const $start = '(?(DEFINE) (?<LiStart> ' & $regEx_Start & ' ) )' Local Const $regEx_End = "<\/li>" Local Const $end = '(?(DEFINE) (?<LiEnd> ' & $regEx_End & ' ) )' Local $regex = _ '(?imsx)' & _ $start & _ $end & _ '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _ '(?&LiBlock)' Local $data = StringRegExp($x, $regex, 3) ;Consolewrite(@LF & 'x' & ' - ' & $x) Local $left, $wilLRecurse For $i = 0 To UBound($data) - 1 Step +1 $data[$i] = StringRegExpReplace($data[$i], _ '(?imsx)\A' & $regEx_Start & '(.*)' & $regEx_End & '\Z', '$1') $wilLRecurse = False If StringRegExp($data[$i], $regex) Then $wilLRecurse = True $left = $data[$i] $hi = StringRegExp($data[$i], '(?imsx)(\A.*?)(?:' & $regEx_Start & ')', 3) $data[$i] = $hi[0] EndIf ;_ArrayDisplay($left) _ArrayAdd($array, $iRecursion) $array[UBound($array) -1][1] = $data[$i] ConsoleWrite(@LF & $iRecursion & ' - ' & $data[$i]) If $wilLRecurse Then hehe($array, $left, $iRecursion) ;EndIf Next ; $iRecursion -= 1 ;_ArrayDisplay($data, $x) Return $data EndFunc ;==>hehe Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 @mikell Its structure is quite similar to this one. But still your last example doesn't do the job in the general case. See the difference: Local $s = "<li>One<li>Inner<li>Innermost</li>Rha ... lovely!</li>oops</li><li>Two</li>" Local $regex = _ '(?imsx)' & _ '(?(DEFINE) (?<LiStart> <li> ) )' & _ '(?(DEFINE) (?<LiEnd> <\/li> ) )' & _ '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _ '(?&LiBlock)' $data = StringRegExp($s, $regex, 3) _ArrayDisplay($data) $data = StringRegExp($s, '<li>(.*?)</li>(?!</li>)' , 3) _ArrayDisplay($data) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Mingre Posted July 28, 2016 Author Share Posted July 28, 2016 13 minutes ago, iamtheky said: what about: #include<array.au3> Local $s = "<li>One<li>Inner<li>Innermost</li></li></li><li>Two</li>" _ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3)) I don't know proper HTML but sometimes there are intervening texts between two "</li>". #include<array.au3> Local $s = "<li><b>One<li>Inner<li>Innermost</li></li></b></li><li>Two</li>" _ArrayDisplay(stringregexp($s , "(<li>.*?(?:</li>)+)" , 3)) Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 Exactly why I pointed that detail out. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted July 28, 2016 Share Posted July 28, 2016 @jchd Right. I surrender Is there a way to write your code using numbered subpatterns instead of named ones ? Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 Of course it's up to you to rewrite it with numbered references. It's a bit faster to parse (few µs), but much more confusing. I find named patterns very useful when a complex regex has to break down several similar structures. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jguinch Posted July 28, 2016 Share Posted July 28, 2016 @jchd and @mikell : I don't understand how it's possible to write JC's code with numbered subpatterns ... For me each number of subpattern correspond to a captured group : it's not the case with defined subroutines which are not captured. (?1) refers to the first capturing group, so how can it be done tp refer to a non capturing group ? Is it possible ? (I search for a long time, so if you have an answer, I would be grateful to you !) ...or the StringRegExp result will have more results that JC's "defined-subroutine" way. By the way, JC, your beautiful regex is not so hard to decorticate, but it really needs an extra-evolved brain to build something like it. Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 You want variants? Okay. Local $s = "<li>One<li>Inner<li>Innermost</li>rhagnagna</li>gloups</li><li>Two</li>" Local $regex = _ '(?imsx)' & _ '(?(DEFINE) (?<LiStart> <li> ) )' & _ '(?(DEFINE) (?<LiEnd> <\/li> ) )' & _ '(?(DEFINE) (?<LiBlock> (?&LiStart) (?: (?&LiBlock)* | .*? )* (?&LiEnd) ) )' & _ '(?&LiBlock)' $data = StringRegExp($s, $regex, 3) _ArrayDisplay($data) $regex = _ '(?imsx)' & _ '(?<LiBlock>' & _ ' <li>' & _ ' (?: (?&LiBlock)* | .*? )*' & _ ' <\/li>' & _ ')' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) $regex = _ '(?imsx)' & _ '(' & _ ' <li>' & _ ' (?: (?-1)* | .*? )*' & _ ' <\/li>' & _ ')' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) $regex = '(?ims)(<li>(?:(?-1)*|.*?)*<\/li>)' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) jguinch and mikell 2 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jchd Posted July 28, 2016 Share Posted July 28, 2016 (edited) I could have added a couple (English "couple" often means 3) of shorter versions: $regex = '(?ims)(<li>(?:(?1)*|.*?)*</li>)' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) $regex = '(?ims)(<li>(?:(?0)*|.*?)*</li>)' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) $regex = '(?ims)(<li>(?:(?R)*|.*?)*</li>)' $data = StringRegExp($s, $regex , 3) _ArrayDisplay($data) Note that all of this and above is only a series of semantic cosmetic rewrites, the structure and working are exactly the same. Edited July 28, 2016 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jguinch Posted July 29, 2016 Share Posted July 29, 2016 wow, impressive ! nice Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now