Jump to content

[Solved] Regular expression to capture multiple groups with prefix


Recommended Posts

Good morning yall!
I wonder how could I capture some values which are part of a string, where a particular prefix appears multiple times, with different values after it.

This is the input string:

User: SomeUser
Login-name: SomeLoginName
NTSecurity: YES
Domain: SomeDomain
Timeout: 00:00:00
Member: MemberOfFirstGroup
Member: MemberOfSecondGroup
Member: MemberOfThirdGroup

This is the output it should produce:

SomeUser
SomeLoginName
YES
MemberOfFirstGroup
MemberOfSecondGroup
MemberOfThirdGroup

So, practically, the Domain and Timeout parameters are ignored by the pattern, which look like this (without the "Member:" captouring group):

User:\s([^\r\n]+)\s*
Login\-name:\s([^\r\n]+)\s*
.*?\s*
NTSecurity:\s([^\r\n]+)\s*

I'm using this pattern in VBScript, so I had to change it a bit from AutoIt regex "pattern' style".
Could you please enlight me on how to do that?
Thanks a lot.

Francesco ^_^

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites

This ?

#include <Constants.au3>

$sText = _
"User: SomeUser" & @CRLF & _
"Login-name: SomeLoginName" & @CRLF & _
"NTSecurity: YES" & @CRLF & _
"Domain: SomeDomain" & @CRLF & _
"Timeout: 00:00:00" & @CRLF & _
"Member: MemberOfFirstGroup" & @CRLF & _
"Member: MemberOfSecondGroup" & @CRLF & _
"Member: MemberOfThirdGroup"

MsgBox ($MB_SYSTEMMODAL, "", StringRegExpReplace($sText, "User:\s|Login-name:\s|NTSecurity:\s|Domain:\s.*\v+|Timeout.*\v+|Member:\s", ""))

 

Link to post
Share on other sites

@Nine
Always a pleasure to see you here around :)
I forgot to mention that these information are part of a bigger string, in which there's a lot of data, so, the best solution would be to capture those values instead of removing what is not needed.
I like the approach by the way, thanks :)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites

Do you prefer this one ?

#include <Constants.au3>

$sText = _
"User: SomeUser" & @CRLF & _
"Login-name: SomeLoginName" & @CRLF & _
"NTSecurity: YES" & @CRLF & _
"Domain: SomeDomain" & @CRLF & _
"Timeout: 00:00:00" & @CRLF & _
"Anything: Before" & @CRLF & _
"Member: MemberOfFirstGroup" & @CRLF & _
"Member: MemberOfSecondGroup" & @CRLF & _
"Member: MemberOfThirdGroup" & @CRLF & _
"Anything: After"

MsgBox ($MB_SYSTEMMODAL, "", StringRegExpReplace($sText, "(?|User|Login-name|NTSecurity|Member):\s(.*)|.+\v*", "$1"))

 

Link to post
Share on other sites

@Nine
Bad news...
Seems that branch reset group is not supported from VBScript, so I currently can't use your solution, even if I already adapted the code to work with that.

Is it possible to have the same result without branch reset groups?
Thanks for your help :)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites
Posted (edited)

@Nine
The above pattern returns a string without \n, so data is formatted like this:

SomeUser1
SomeLoginName1
YES
MemberOfFirstGroup1
MemberOfSecondGroup1
MemberOfThirdGroup1
SomeUser2
SomeLoginName2
YES
MemberOfFirstGroup2

instead of:

SomeUser1
SomeLoginName1
YES
MemberOfFirstGroup1
MemberOfSecondGroup1
MemberOfThirdGroup1

SomeUser2
SomeLoginName2
YES
MemberOfFirstGroup2

and doing so, I have no discriminating string to know where a new "group" of information starts (and so, neither where a group of information ends).
Fun fact is if I try both patterns with RegEx101.com, I have a different result from VBScript one.
Ideally, and based on the result I had from RegEx101.com, every group of information is divided by two \n, and so, I can split those "chunks" in groups which handle one user information, which I then use throughout the script.
Thanks again for you help.

Edited by FrancescoDiMuro
Added more information

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites

If there is only @LF (\n) at the end of each line and there is \n between groups, this should work :

#include <Constants.au3>

$sText = _
"User: SomeUser" & @LF & _
"Login-name: SomeLoginName" & @LF & _
"NTSecurity: YES" & @LF & _
"Domain: SomeDomain" & @LF & _
"Timeout: 00:00:00" & @LF & _
"Anything: Before" & @LF & _
"Member: MemberOfFirstGroup" & @LF & _
"Member: MemberOfSecondGroup" & @LF & _
"Member: MemberOfThirdGroup" & @LF & _
"Anything: After" & @LF & @LF & _
"User: SomeUser2" & @LF & _
"Login-name: SomeLoginName2" & @LF & _
"NTSecurity: YES" & @LF & _
"Domain: SomeDomain" & @LF & _
"Timeout: 00:00:00" & @LF & _
"Anything: Before" & @LF & _
"Member: MemberOfFirstGroup2" & @LF & _
"Member: MemberOfSecondGroup2" & @LF & _
"Member: MemberOfThirdGroup2" & @LF & _
"Anything: After"

MsgBox ($MB_SYSTEMMODAL, "", StringRegExpReplace($sText, "(User|Login-name|NTSecurity|Member):\s(.*)|.+\n?", "$2"))

 

Link to post
Share on other sites

@Nine

Spoiler

image.png.0aab5de0caabc00e3d31a42d6ae90ff5.png

As you can see, the various lines are divided by @CR and @LF, but even with the pattern above, the script returns a string with just @CR @LF after each line, without blank lines as they appears in the original file (image below), so I assume that they're removed.

Spoiler

image.png.c55b96a274403d619b9f0d5128cf1c9a.png

As I said before, AutoIt and VBScript supports different regex motors, and so, what works on AutoIt may not work on VBScript.
Thanks for your kind help :) 

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites

@Deye

You are definitely right.

I am sorry that I didn't post it yet, but I have been busy all day trying to figure out how to structure the script, and since the file contains only sensitive data, it would have taken a lot to replace it with test data.

In fact, what I did post is what I have in the source file, but it's "filtered", and @Nine provided valid patterns which work outside VBScript, but not with it, so, it has nothing to do with data itself; rather, it's something that might be about VBScript.

I need to make some tests with AutoIt instead of using VBScript and see if the results are the same, but I already know they won't.

I'll let you know as soon as possible.

Thanks :)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites
Posted (edited)

@Nine, @Deye
Here I am with a sample file and some tests done.
The file uploaded has the same structure that has the original one, but with a lot less information which are not needed to be in the sample file.
I made tests with this pattern in AutoIt, and blank lines are left there as they were, while doing the same test with VBScript, they are removed from the source file.

Here below, you can find both AutoIt script and VBScript to test with the sample file.
Let me know what are your results with the scripts, even if I am quite convinced that VBScript won't be able to bring the same result has AutoIt is doing.
As always, a big thanks for both of you!

P.S.: Happy fishing :D 

AutoIt:

#include <FileConstants.au3>
#include <StringConstants.au3>

Test()

Func Test()

    Local $strFileName = @ScriptDir & "\SECURITY.RPT", _
          $hdlFile, _
          $strFileContent


    $hdlFile = FileOpen($strFileName, $FO_READ)
    If Not $hdlFile Then Return ConsoleWrite("FileOpen ERR: " & $hdlFile & @CRLF)

    $strFileContent = FileRead($hdlFile)
    If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF)

    $strFileContent = StringRegExpReplace($strFileContent, "(?|User|Login-name|NTSecurity|Member):\s(.*)|.+\v*", "$1")
    If @error Then Return ConsoleWrite("StringRegExpReplace ERR: " & @error & @CRLF)

    ConsoleWrite($strFileContent & @CRLF)

    FileClose($hdlFile)

EndFunc

VBScript:

Dim strFileName: strFileName = "SECURITY.rpt"
Dim objFSO
Dim objTextStreamIn
Dim objTextStreamOut
Dim strFileContent
Dim objRegEx
  

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextStreamIn = objFSO.OpenTextFile(strFileName, 1, -1)
Set objTextStreamOut = objFSO.CreateTextFile(".\Result.txt")

strFileContent = objTextStreamIn.ReadAll

objTextStreamIn.Close

Set objRegEx = CreateObject("VBScript.RegExp")

With objRegEx
	.Pattern = "(User|Login-name|NTSecurity|Member):\s(.*)|.+\n?"
	.Global = True
	.IgnoreCase = False
	.Multiline = True
	
	objTextStreamOut.Write(.Replace(strFileContent, "$2"))
	
End With

objTextStreamOut.Close

Set objRegEx = Nothing
Set objTextStreamOut = Nothing
Set objTextStreamIn = Nothing
Set objFSO = Nothing

SECURITY.RPT

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to post
Share on other sites
2 hours ago, FrancescoDiMuro said:

it leaves blank lines at the top of the string

without too much of examining how, this somehow moves everything to the top

"(User|Login-name|NTSecurity|Member):\s(.*)|(.+)\r\n?(?:\r?\n)"

 

Link to post
Share on other sites
  • FrancescoDiMuro changed the title to [Solved] Regular expression to capture multiple groups with prefix

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...