Jump to content

Repeating Regex Pattern


Recommended Posts

Hey Guys,

I'm trying to understand repeating regex patterns.  The example could be easily done with StringSplit I know but it's a good example to use to address my long time neglect of the repeating regex stuff.

I've attached the datafile so anyone who wants to help has exactly what I'm using.

What I'm trying to achieve is to turn this text file into a 1d array with each line as it's own element (except the empty line in between blocks)

My code  looks like this:
 

#include <Array.au3>

$a = FileRead(@ScriptDir & "\Versions.log")

$regexp = "(.+)\r\n(.+)\r\n(.+)\r\n(.+)\r\n(.+)\r\n"   ;this works
;~ $regexp = "(?:(.+)\r\n){1}"        ;this also works and i have no idea how

$a_regfind = StringRegExp($a,$regexp,3)

_ArrayDisplay($a_regfind,"$a_registryfind")

The long regex at the top works fine.

The second regex also works fine but i thought that since theres 5 repeats of the pattern that I should put 5 in there.  That only returns the last line of each block.  Changed to a 1 and it works.

Feel like an ape that just accidentally bumped a light switch on and screeched at the scary lightbulb 🦍

If anyone has time would you please show me how to do this repeating regex properly?

Versions.log

Link to comment
Share on other sites

@InclusiveExclusion
Since you are using a global match ($STR_REGEXPARRAYGLOBALMATCH = 3), your pattern is searched for the whole file, and so, it would work even without the {1}.
Look at this example:

#include <Array.au3>
#include <StringConstants.au3>

Test()

Func Test()

    Local $strFileName = @ScriptDir & "\Versions.log", _
          $strFileContent, _
          $arrResult


    $strFileContent = FileRead($strFileName)
    If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF)

    $arrResult = StringRegExp($strFileContent, '(?m)^\s*([^\r\n]+)\s*$', $STR_REGEXPARRAYGLOBALMATCH)
    If IsArray($arrResult) Then _ArrayDisplay($arrResult)

EndFunc

Splitting the pattern, you have this:

^ asserts position at start of a line
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ([^\r\n])
Match a single character not present in the list below [^\r\n]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Global pattern flags
(?m) modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

And, since the function SRE is using $STR_REGEXPARRAYGLOBALMATCH here too, then the pattern is applied globally to the string, so, no matter how many lines you do have in the file, as long as the pattern is verified, it is returned by the SRE function.
By the way, use this website to check out what's going on with your patterns :)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

8 hours ago, mikell said:

BTW I suppose that when saying 'repeating regex patterns' you meant something like this

$regexp = "(.+)(?=\R){5}"

But the example you chose is not the best one for this because in this case the 'repeating' feature is automatic so such a syntax is useless
chepa.gif

that's what I was looking for.  Just trying to understand repeating  patterns

Link to comment
Share on other sites

$regexp = "((?:.+\R){5})"

Something like this makes sense if you want to capture the whole paragraph at a time. The inside non-capturing group gets a whole line and a newline, and the outer group captures 5 of them at once.

Also, if you need help with Regular Expressions, RegEx101.com has nice explanations and a cool way to share RegEx --> https://regex101.com/r/7Ma34B/1

Sorry for being a day late @FrancescoDiMuro, but I'll chase @mikell away! :muttley:

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

14 hours ago, seadoggie01 said:

if you want to capture the whole paragraph at a time

Yes  :)
But if the purpose is to get for each paragraph the lines in a subarray then a 2nd step with StringSplit is needed
StringRegExp with flag 4 could be used too but in this case the 'long' syntax (the OP's one in post #1) is unavoidable

#include <Array.au3>

$a = FileRead(@ScriptDir & "\Versions.log")
$regexp = "(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R" 
$a_regfind = StringRegExp($a,$regexp,4)

For $i = 0 to 3
    _ArrayDisplay($a_regfind[$i],"$a_registryfind")
Next

Why so much dogs mentioned around here ? crazy.gif

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...