Repeating Regex Pattern

InclusiveExclusion · March 18, 2021

Hey Guys,

I'm trying to understand repeating regex patterns. The example could be easily done with StringSplit I know but it's a good example to use to address my long time neglect of the repeating regex stuff.

I've attached the datafile so anyone who wants to help has exactly what I'm using.

What I'm trying to achieve is to turn this text file into a 1d array with each line as it's own element (except the empty line in between blocks)

My code looks like this:

#include <Array.au3>

$a = FileRead(@ScriptDir & "\Versions.log")

$regexp = "(.+)\r\n(.+)\r\n(.+)\r\n(.+)\r\n(.+)\r\n"   ;this works
;~ $regexp = "(?:(.+)\r\n){1}"        ;this also works and i have no idea how

$a_regfind = StringRegExp($a,$regexp,3)

_ArrayDisplay($a_regfind,"$a_registryfind")

The long regex at the top works fine.

The second regex also works fine but i thought that since theres 5 repeats of the pattern that I should put 5 in there. That only returns the last line of each block. Changed to a 1 and it works.

Feel like an ape that just accidentally bumped a light switch on and screeched at the scary lightbulb 🦍

If anyone has time would you please show me how to do this repeating regex properly?

Versions.log

FrancescoDiMuro · March 18, 2021

@InclusiveExclusion
Since you are using a global match ($STR_REGEXPARRAYGLOBALMATCH = 3), your pattern is searched for the whole file, and so, it would work even without the {1}.
Look at this example:

#include <Array.au3>
#include <StringConstants.au3>

Test()

Func Test()

    Local $strFileName = @ScriptDir & "\Versions.log", _
          $strFileContent, _
          $arrResult


    $strFileContent = FileRead($strFileName)
    If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF)

    $arrResult = StringRegExp($strFileContent, '(?m)^\s*([^\r\n]+)\s*$', $STR_REGEXPARRAYGLOBALMATCH)
    If IsArray($arrResult) Then _ArrayDisplay($arrResult)

EndFunc

Splitting the pattern, you have this:

^ asserts position at start of a line
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^\r\n])
Match a single character not present in the list below [^\r\n]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Global pattern flags
(?m) modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

And, since the function SRE is using $STR_REGEXPARRAYGLOBALMATCH here too, then the pattern is applied globally to the string, so, no matter how many lines you do have in the file, as long as the pattern is verified, it is returned by the SRE function.
By the way, use this website to check out what's going on with your patterns

Nine · March 18, 2021

$regexp = "([^\v]*)\v*"

Maybe ?

FrancescoDiMuro · March 18, 2021

@Nine
Use + instead of * in the capturing group, otherwise the function returns even the last blank line

Nine · March 18, 2021

Didn't see that last line...

InclusiveExclusion · March 18, 2021

Nice. Thanks for the detailed info lads 👍

March 18, 2021

5 minutes ago, Nine said:

Didn't see that last line...

I guess, @FrancescoDiMuro means this :

LastLine.png.607c567eb8b90edd933d0ac86570d46a.png

Nine · March 18, 2021

@Musashi Yes I know. I just didn't bother to go all the way down...

mikell · March 18, 2021

$regexp = "\N+"

:huh2:

FrancescoDiMuro · March 18, 2021

I knew I should have bring a dog here

mikell · March 18, 2021

7 minutes ago, FrancescoDiMuro said:

I knew I should have bring a dog here

InclusiveExclusion · March 18, 2021

13 minutes ago, mikell said:
$regexp = "\N+"

drops mic 😄

mikell · March 18, 2021

BTW I suppose that when saying 'repeating regex patterns' you meant something like this

$regexp = "(.+)(?=\R){5}"

But the example you chose is not the best one for this because in this case the 'repeating' feature is automatic so such a syntax is useless

Nine · March 18, 2021

31 minutes ago, FrancescoDiMuro said:

I knew I should have bring a dog here

Or a mouse, he would have gone playing with it

FrancescoDiMuro · March 18, 2021

@Nine
Nice idea, but what do you think about this one?

Here @mikell, *smooch smooch smooch*

Spoiler

(Joke, you know :D)

gomitolo lana - Arca Lodi

mikell · March 18, 2021

A gaming mouse ? sounds much better

InclusiveExclusion · March 19, 2021

8 hours ago, mikell said:
BTW I suppose that when saying 'repeating regex patterns' you meant something like this
$regexp = "(.+)(?=\R){5}"
But the example you chose is not the best one for this because in this case the 'repeating' feature is automatic so such a syntax is useless

that's what I was looking for. Just trying to understand repeating patterns

seadoggie01 · March 19, 2021

$regexp = "((?:.+\R){5})"

Something like this makes sense if you want to capture the whole paragraph at a time. The inside non-capturing group gets a whole line and a newline, and the outer group captures 5 of them at once.

Also, if you need help with Regular Expressions, RegEx101.com has nice explanations and a cool way to share RegEx --> https://regex101.com/r/7Ma34B/1

Sorry for being a day late @FrancescoDiMuro, but I'll chase @mikell away! :muttley:

FrancescoDiMuro · March 19, 2021

@seadoggie01

Thanks!

Now I know who to call when we need to chase flavoured cats

Edited March 19, 2021 by FrancescoDiMuro

mikell · March 20, 2021

14 hours ago, seadoggie01 said:

if you want to capture the whole paragraph at a time

Yes
But if the purpose is to get for each paragraph the lines in a subarray then a 2nd step with StringSplit is needed
StringRegExp with flag 4 could be used too but in this case the 'long' syntax (the OP's one in post #1) is unavoidable

#include <Array.au3>

$a = FileRead(@ScriptDir & "\Versions.log")
$regexp = "(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" 
$a_regfind = StringRegExp($a,$regexp,4)

For $i = 0 to 3
    _ArrayDisplay($a_regfind[$i],"$a_registryfind")
Next

Why so much dogs mentioned around here ?

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached) Debug Messages Monitor UDF	Screen Scraping Round Corner GUI UDF	Multi-Threading Made Easy Interface Object based on Tag

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached) Debug Messages Monitor UDF	Screen Scraping Round Corner GUI UDF	Multi-Threading Made Easy Interface Object based on Tag

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached) Debug Messages Monitor UDF	Screen Scraping Round Corner GUI UDF	Multi-Threading Made Easy Interface Object based on Tag

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached) Debug Messages Monitor UDF	Screen Scraping Round Corner GUI UDF	Multi-Threading Made Easy Interface Object based on Tag

Sign In

Repeating Regex Pattern

Recommended Posts

InclusiveExclusion

FrancescoDiMuro

Nine

FrancescoDiMuro

Nine

InclusiveExclusion

Guest

Nine

mikell

FrancescoDiMuro

mikell

InclusiveExclusion

mikell

Nine

FrancescoDiMuro

mikell

InclusiveExclusion

seadoggie01

FrancescoDiMuro

mikell

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta