Jump to content
Sign in to follow this  
Iczer

RegEx how to make one group match multiple times

Recommended Posts

Iczer

For some reason i cant make one group match multiple times:

#include <Array.au3>

$sText = "$$$ 12345 aaa  bbb  ccc  ddd  eee  678 $$$"

$pattern = ".*?(\d\d\d\d\d).+?([^\h]{3}\h\h)+?(\d\d\d)"

$rezult = StringRegExp($sText,$pattern,3)

ConsoleWrite("@error = "&@error&@CRLF)
_ArrayDisplay($rezult)

What I need is to get array like that:

$array = ["12345", "aaa", "bbb", "ccc", "ddd", "eee", "678"]

But all i get is this array:

$array = ["12345",  "eee",  "678"]

Is there away to make it work with one regexp pattern?

Share this post


Link to post
Share on other sites
Malkey

Try

$pattern = "\w+"

Share this post


Link to post
Share on other sites
Melba23

Iczer,

This seems to work:

#include <Array.au3>

$sText = "$$$ 12345 aaa  bbb  ccc  ddd  eee  678 $$$"

$pattern = "(?i)(\d+|[a-z]+)"

$rezult = StringRegExp($sText,$pattern,3)

ConsoleWrite("@error = "&@error&@CRLF)
_ArrayDisplay($rezult)
It captures any size grouping of all numbers or all letters. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
Iczer

:sweating:  no, my example is only example, my ultimate goal - is know, how i can get multiple results from one group:

$pattern1 = "\d\d\d\d\d"
$pattern2 = "[^\h]{3}\h\h"
$pattern3 = "\d\d\d"

$pattern = ".*?("&$pattern1&").+?("&$pattern2&")+?("&$pattern3&")"
------------------------------------------------^

Share this post


Link to post
Share on other sites
jchd

The number of elements in the resulting array (leaving PHP mode alone) is equal to the number of capturing groups present in the pattern, not their repetition specification(s).


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
mikell

But it is possible to combine patterns using alternation

#include <Array.au3>

$sText = "$$$  12345 aaa  bbb  ccc  ddd  eee  678 $$$"

$pattern1 = "(?:\d+)"
$pattern2 = "(?:[a-z]+)"
$pattern = "(" & $pattern1 & "|" & $pattern2 & ")"

$rezult = StringRegExp($sText, $pattern, 3)
_ArrayDisplay($rezult)

Share this post


Link to post
Share on other sites
Iczer

The number of elements in the resulting array (leaving PHP mode alone) is equal to the number of capturing groups present in the pattern, not their repetition specification(s).

so simple answer would be something like :

#include <Array.au3>

$sText = "$$$ 12345  aaa  bbb  ccc  ddd  eee  678 $$$"

$internal = "[^\h]{3}\h\h"

$pattern = ".*?(\d\d\d\d\d)\h\h((?:"&$internal&")+)(\d\d\d)"

$rezult1 = StringRegExp($sText,$pattern,3)
$rezult2 = StringRegExp($rezult1[1],$internal,3)

_ArrayInsert($rezult2,0,$rezult1[0])
_ArrayAdd($rezult2,$rezult1[2])

ConsoleWrite("@error = "&@error&@CRLF)
_ArrayDisplay($rezult2)

But what is "PHP mode" ? Is it possible within it?

also i found:


Backtracking information is discarded when a match is found, so there's no way to tell after the fact that the group had a previous iteration that matched abc. (The only exception to this is the .NET regex engine, which does preserve backtracking information for capturing groups after the match attempt.)


mikell

But it is possible to combine patterns using alternation

 

thanks, but it only works if patterns does not have dependence from each other:

#include <Array.au3>

$sText = "qqq  $$$ 12345  aaa  bbb  ccc  ddd  eee  678 $$$"

$pattern1 = "\d\d\d\d\d"
$internal = "[^\h]{3}\h\h"
$pattern2 = "\d\d\d"

$pattern = "((?:"&$pattern1&")|(?:"&$internal&")|(?:"&$pattern2&"))"

$rezult = StringRegExp($sText,$pattern,3)

ConsoleWrite("@error = "&@error&@CRLF)
_ArrayDisplay($rezult)

Share this post


Link to post
Share on other sites
jchd

You're confusing two things: StringRegExp with flag = 3 internally launches matching of the pattern at successive offsets in subject string after every match. So the resulting array may have as many elements as (number of capturing patterns) * (number of pattern matches at increasing offsets). When you specify a repetition to a capturing group, it is obligatory for the matching engine but what is left in the output array is always the last match.

Local $ar = StringRegExp("abbBc", "(?i)(\w)(\w{3})(\w)", 1)
_ArrayDisplay($ar)
Local $ar = StringRegExp("abbBc", "(?i)(\w)(\w){3}(\w)", 1)
_ArrayDisplay($ar)

It remains that basically (i.e. with flag=1) the number of elements in the output array is equal to th number of capturing patterns.

PHP modes (flag=2 or 4) are somehing else and would only add confusion to this discussion.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
mikell

Iczer your last example works...

Notice that $pattern2 doesn't match "123" because of the sequence of the subpatterns

Would you please post an example where what you get is not the expected result ?

Share this post


Link to post
Share on other sites
Iczer

jchd

i expect quantifiers to work the same way on symbols and groups :

StringRegExp ( "123test", "(?:\d+)(\w+)", 3); --> $array[1] : "[test]"

StringRegExp ( "123test", "(?:\d+)(\w)+", 3); --> $array[4] : "[t,e,s,t]"

and i cannot see any valid reason for it to be wrong. But it seems i'm wrong.



mikell

i expect this :

array = [12345,aaa,bbb,ccc,ddd,eee,678]

but get this :

array = [qqq,12345,aaa,bbb,ccc,ddd,eee,678]

where "qqq" is not wanted part. I do not think it is possible get rid of it without the first pattern have to set starting mark for matching second pattern. So "|" is not valid in this case

Share this post


Link to post
Share on other sites
jchd

jchd

i expect quantifiers to work the same way on symbols and groups :

StringRegExp ( "123test", "(?:\d+)(\w+)", 3); --> $array[1] : "[test]"

StringRegExp ( "123test", "(?:\d+)(\w)+", 3); --> $array[4] : "[t,e,s,t]"

and i cannot see any valid reason for it to be wrong. But it seems i'm wrong.

In the first pattern "(w+)" is a capturing group (numbered 2 in this pattern) matching a sequence of one or more "word" characters. It will be stored in the resulting array at odd positions starting with 1 (1, 3, 5, as many times as the pattern matches).

In the second pattern "(w)+" is a repeated capturing group (numbered 2 in this pattern) matching exactly one "word" character every time. With the flag = 3 option, the whole pattern is repeated as much as possible.

With flag 3 you can better see the effect whith several occurences:

#include <Array.au3>

$array = StringRegExp ( "123test    456NEXT  789end", "(?:\d+)(\w+)", 3)
_ArrayDisplay($array, "Results (\w+)")
$array = StringRegExp ( "123test    456NEXT  789end", "(?:\d+)(\w)+", 3)
_ArrayDisplay($array, "Results (\w)+")

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
kylomas

Iczer,

Is this what you are looking for?

#include <array.au3>
$sText = "qqq  $$$ 12345  aaa  bbb  ccc  ddd  eee  678 $$$"
local $aTest = stringregexp($sText,' ([^\$ ]+) ',3)
_arraydisplay($aTest)

kylomas

edit: Or are you looking for any repeating pattern to use as boundries for the capture?

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
kylomas

Iczer,

This might interest you.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×