Sign in to follow this  
Followers 0
Uten

Regular expresion extracting a repeating sub pattern

22 posts in this topic

I need some help to extract some registry data. I need to do it by a regular expression as that expression will go into another application (PCRE compatible).

If you dump one of your hives you will see that the registry dump follows a pattern. start pattern, sub pattern (0 or more times), and an end pattern.

I thought I could do it with something like this:

$flags = '(?m)'                 ;To use ^ and $
$star = '^\[.+\]\s$'
$sub = '(^".+"=.*\s$){2,}'     ;The pattern inside the ( ... ) Is repeated at least twice
$end = '^\s+$'               ;I thought ^$ should be an empty line but it did not work out that way?
$regexp = $flags & $start & $sub & $endoÝ÷ ØÚ0¶¬v'g¢Ü(®J.¶Ê0zYH¶ÇÉbëaz˶§è¬¶©®åzf¬¶¬v'¢ÚÞ¶êçjYmë.qǬ²+Þ)Þ²f­r§Ø^¥«mz¹âÞjºÚÉé^éí׬¶Ç+m¢ZË­«­¢+ÙÕ¹ÑÍÑI¥ÍÑÉåáÑÉÐ ¤(%IÕ¹]¥Ð ÌäíɥнèÀäÈíÑÍÑѹÉÅÕ½Ðí!-e}
UII9Q}UMHÅÕ½ÐìÌäì¤($ÀÌØíÑô¥±I ÅÕ½ÐíèÀäÈíÑÍÑѹÉÅÕ½Ðì¤(%%ÉɽȽÈÀÌØíÑôÅÕ½ÐìÅÕ½ÐìQ¡¸($%
½¹Í½±]É¥Ñ ÅÕ½Ðì´´´ØÈìÉɽÈèôÅÕ½ÐìµÀìÉɽȵÀìÅÕ½Ðì°½ÈÑÉ¥±ÕÉÅÕ½ÐìµÀìɱ¤(%±Í($$ÀÌØí±ÌôÌäì ý´¤ÌäìíQ¼ÕÍx¹ÀÌØì($$ÀÌØíÍÑÉÐôÌäìÌäììÌäíxÀäÈíl¸¬ÀäÈítÀäÈíÌÀÌØìÌäì($$ÀÌØíÍÕôÌäì¡xÅÕ½Ð츬ÅÕ½Ðìô¸¨ÀäÈí̬ÀÌØì¥ìȱôÌäìíQ¡ÁÑÑɸ¥¹Í¥Ñ¡ ¸¸¸¤%ÌÉÁÑ($$ÀÌØí¹ôÌäìÌäììÌäíxÀäÈí̬ÀÌØìÌäìí$Ñ¡½Õ¡ÐxÀÌØìÍ¡½Õ±¸µÁÑä±¥¹ÕХХ¹½Ðݽɬ½ÕÐÑ¡ÐÝäü($$ÀÌØíÉáÀôÀÌØí±ÌµÀìÀÌØíÍÑÉеÀìÀÌØíÍÕµÀìÀÌØí¹($$ÀÌØíÉÈôMÑÉ¥¹IáÀ ÀÌØíÑ°ÀÌØíÉáÀ°Ì¤ìÌô±½°($%%ÉɽÈÑ¡¸($$%
½¹Í½±]É¥Ñ ÅÕ½Ðì´´ØÈìÉɽÈèôÅÕ½ÐìµÀìÉɽȵÀì
I1¤($$$ìÄô9½5Ñ ($$$ìÈôIáÀÁÑÑɸ¥±ÕÉ($%±Í($$%
½¹Í½±]ɥѡU    ½Õ¹ ÀÌØíÉȤ´ÄµÀìɱ¤($$%½ÈÀÌØí¤ôÀѼU    ½Õ¹ ÀÌØíÉȤ´Ä($$$%%ÀÌØí¤ôÄÀÀQ¡¸á¥Ñ1½½À($$$%
½¹Í½±]É¥Ñ ÅÕ½Ðì´´´ÅÕ½ÐìµÀìÀÌØíÉÉlÀÌØí¥tµÀì
I1¤($$%9áÐ($%¹%(%¹%(%¥±±Ñ ÅÕ½ÐíèÀäÈíÑÍÑѹÉÅÕ½Ðì¤)¹Õ¹)ÑÍÑI¥ÍÑÉåáÑÉÐ

Should repeating pattern groups like this work? Or did I misinterpret the help file?

Happy Scripting..:)

Uten

Share this post


Link to post
Share on other sites



What data are you trying to extract exactly?

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Data from a registry file (as explained in the TP?)

Registry files looks like this

[HKEY_CURRENT_CONFIG\Software]

[HKEY_CURRENT_CONFIG\Software\Fonts]
"FIXEDFON.FON"="vgafix.fon"
"FONTS.FON"="vgasys.fon"
"OEMFONT.FON"="vgaoem.fon"
"LogPixels"=dword:00000060

[HKEY_CURRENT_CONFIG\Software\Microsoft]

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows]

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion]

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion\Internet Settings]
"ProxyEnable"=dword:00000002
"EnableAutodial"=hex:00,00,FF,00
"NoNetAutodial"=hex:00,33,00,00

[HKEY_CURRENT_CONFIG\System]

And I would like an array to look something like this (including the line shifts)

arr[0] = 2

arr[1]=[HKEY_CURRENT_CONFIG\Software\Fonts]
"FIXEDFON.FON"="vgafix.fon"
"FONTS.FON"="vgasys.fon"
"OEMFONT.FON"="vgaoem.fon"
"LogPixels"=dword:00000060

arr[2]=[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion\Internet Settings]
"ProxyEnable"=dword:00000002
"EnableAutodial"=hex:00,00,FF,00
"NoNetAutodial"=hex:00,33,00,00

EDIT: Tried to make arr[?] bold in the code block but it did not work as planed.

Edited by Uten

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Ah i see...

Well could it be that you are over-thinking this? :)

#include <Array.au3>
Func testRegistryextract()
    RunWait('regedit /E c:\testData.reg "HKEY_CURRENT_USER"')
    $data = FileRead("c:\testData.reg")
    If @error or $data = "" Then
        ConsoleWrite("---> @error:=" & @error & ", or data read failure" & @crlf )
    Else
        $arr = StringSplit($Data, @CRLF&@CRLF,1)
        If @error then
           ConsoleWrite("--> @error:=" & @error & @CRLF)
        Else
            _ArrayDisplay($arr,"")
            ConsoleWrite(UBound($arr)-1 &@crlf&"============"&@CRLF)
            For $i = 1 to UBound($arr) - 1
                If $i = 100 Then ExitLoop
                ConsoleWrite("--- " & $arr[$i] & @CRLF)
            Next
        EndIf
    EndIf
    FileDelete("c:\testData.reg")
EndFunc
testRegistryextract()

This works pretty well for me....

Edit: good vs well

Edit2:

I just re-read the thread an you said you had to use stringregexp. ;) perhaps there is a way to perform a stringsplit with StringRegExp() ?

Edited by Paulie

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I wrote a _StringSplitRegExp() function not long ago.

http://www.autoitscript.com/forum/index.ph...c=65662&hl=

#include<Array.au3>
$fileData = FileRead("test.reg")
$result = _StringSplitRegExp($fileData, @CRLF & @CRLF)
_ArrayDisplay($result)
For string splitting it could be nice but not for making a PCRE compatible regular expression extracting data.

Hi,

maybe you get the answer if you take a look at this script and how it works ... :)

http://www.autoitscript.com/forum/index.ph...other+regto.au3

Just use the IniRead Functions to handle *.reg files ...

Does not seem to have a sample of the special case where you have a repeating pattern inside the pattern you want to extract:

( ... ) Group. The elements in the group are treated in order and can be repeated together. e.g. (ab)+ will match "ab" or "abab", but not "aba". A group will also store the text matched for use in back-references and in the array returned by the function, depending on flag value.

So I want to extract only the part of a registry file witch start with a line having a starting [ and a ending ] and followed by at least two lines starting with a " then then a word pattern and then "= and some arbitrary data. I think the regular expresion should look something like this: regexp:= (?m)^\[[^\[\]+\]\s+(^"\w+"=.*$){2,}.

I'm not sure if I'm allowed to use the ^ (as in start of line) inside the repeating block ( ... ). Anyone know if this is allowed?

Think I will have to dig up the PCRE test application and use results from there as a reference.

Thanks for the suggestions so fare.

Uten

EDIT: Added some space

Edited by Uten

Share this post


Link to post
Share on other sites

What is this repeating pattern you keep talking about?

Share this post


Link to post
Share on other sites

Don't think I can explain it better than I already have tried to. The words just don't come to me. Take a look at post #3. And my post above.

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

#include <array.au3>
$sString = '[HKEY_CURRENT_CONFIG\Software]' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\Software\Fonts]' & @CRLF & _
'"FIXEDFON.FON"="vgafix.fon"' & @CRLF & _
'"FONTS.FON"="vgasys.fon"' & @CRLF & _
'"OEMFONT.FON"="vgaoem.fon"' & @CRLF & _
'"LogPixels"=dword:00000060' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\Software\Microsoft]' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\Software\Microsoft\windows]' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion]' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion\Internet Settings]' & @CRLF & _
'"ProxyEnable"=dword:00000002' & @CRLF & _
'"EnableAutodial"=hex:00,00,FF,00' & @CRLF & _
'"NoNetAutodial"=hex:00,33,00,00' & @CRLF & @CRLF & _
'[HKEY_CURRENT_CONFIG\System]'

$aSRE = StringRegExp($sString, '(?s)(?i)(\[[a-z0-9_\- \\]+\]\r\n".+?)\r\n\r\n', 3)
_ArrayDisplay($aSRE)

Edit:

Fixed expression to allow hyphens and numbers in the bracket string.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I can't get the (?s) to work correctly so I am having some difficulty.

This should output 12345 but it outputs nothing:

$result = StringRegExp("12345", "(?s)(?s)(?s)(?s)(?s)",3)
For $X = 0 to Ubound($result) - 1
    ConsoleWrite("[" & $X & "] = " & $result[$X] & @CRLF)
Next

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Edit:

Fixed expression to allow hyphens and numbers in the bracket string.

I couldn't get this to ouptput my test file correctly:

[HKEY_CURRENT_CONFIG\Software\Fonts]
 "FIXEDFON.FON"="vgafix.fon"
 "FONTS.FON"="vgasys.fon"
 "OEMFONT.FON"="vgaoem.fon"
 "LogPixels"=dword:00000060
 
 [HKEY_CURRENT_CONFIG\Software\Microsoft]
 "TESTProxyEnable"=dword:00000002
 "TESTEnableAutodial"=hex:00,00,FF,00
 
 [HKEY_CURRENT_CONFIG\Software\Microsoft\windows]
 
 [HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion]
 
 [HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion\Internet Settings]
 "ProxyEnable"=dword:00000002
 "EnableAutodial"=hex:00,00,FF,00
 "NoNetAutodial"=hex:00,33,00,00

Only the first 2 sections are being displayed, I am using $data = FileRead("test.reg") to read the file.

Edited by weaponx

Share this post


Link to post
Share on other sites

I couldn't get this to ouptput my test file correctly:

[HKEY_CURRENT_CONFIG\Software\Fonts]
"FIXEDFON.FON"="vgafix.fon"
"FONTS.FON"="vgasys.fon"
"OEMFONT.FON"="vgaoem.fon"
"LogPixels"=dword:00000060

[HKEY_CURRENT_CONFIG\Software\Microsoft]
"TESTProxyEnable"=dword:00000002
"TESTEnableAutodial"=hex:00,00,FF,00

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows]

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion]

[HKEY_CURRENT_CONFIG\Software\Microsoft\windows\CurrentVersion\Internet Settings]
"ProxyEnable"=dword:00000002
"EnableAutodial"=hex:00,00,FF,00
"NoNetAutodial"=hex:00,33,00,00

Only the first 2 sections are being displayed, I am using $data = FileRead("test.reg") to read the file.

Ok, I'll have a look, but just FYI, if you follow the flow of the strings:

[HKEY_CURRENT_CONFIG\Software\Fonts]

Would never be the first string, the primary file location seems to always be first, then the next directory etc etc and so on until it has it's file location... such as

[HKEY_CURRENT_CONFIG\Software]

[HKEY_CURRENT_CONFIG\Software\Fonts]

But I'll look to see what can be done if the rest is not followed by another example in the registry file being read, and if it is the last string in that read file... which is the example you provided.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Hows this:

#include <array.au3>
$sString = FileRead(@DesktopDir & "\testreg.reg")

$aSRE = StringRegExp($sString, '(?s)(?i)(\[[a-z0-9_\- \\]+\]\r\n".+?)(?m:\z|\r\n\r\n)', 3)
_ArrayDisplay($aSRE)
?


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
Hows this:
#include <array.au3>
$sString = FileRead(@DesktopDir & "\testreg.reg")

$aSRE = StringRegExp($sString, '(?s)(?i)(\[[a-z0-9_\- \\]+\]\r\n".+?)(?m:\z|\r\n\r\n)', 3)
_ArrayDisplay($aSRE)oÝ÷ Ûú®¢×x(!ØÙbëaË(!Øm«Þ¶zØ^º  r³)¢["
ëk â²ZÉ«­¢+ØÀÌØíÑô¥±I ÅÕ½ÐíÑÍйÉÅÕ½Ðì¤((ÀÌØíÁÑÑɸôÌäì ý̤ ý¤¤ ÀäÈímmµèÀ´å|ÀäÈì´ÀäÈìÀäÈít¬ÀäÈítÀäÈíÈÀäÈí¸ÅÕ½Ð츬ü¤ ý´èÀäÈíéðÀäÈíÈÀäÈí¸ÀäÈíÈÀäÈí¸¤Ìäì((ÀÌØíÉÍÕ±ÐôMÑÉ¥¹IáÀ ÀÌØíÑ°ÀÌØíÁÑÑɸ°Ì¤()½ÈÀÌØí`ôÀѼU½Õ¹ ÀÌØíÉÍձФ´Ä(
½¹Í½±]É¥Ñ ÅÕ½ÐílÅÕ½ÐìµÀìÀÌØí`µÀìÅÕ½ÐítôÅÕ½ÐìµÀìÀÌØíÉÍÕ±ÑlÀÌØíatµÀì
I1¤)9á

Share this post


Link to post
Share on other sites

So I want to extract only the part of a registry file witch start with a line having a starting [ and a ending ] and followed by at least two lines starting with a " then then a word pattern and then "= and some arbitrary data. I think the regular expresion should look something like this: regexp:= (?m)^\[[^\[\]+\]\s+(^"\w+"=.*$){2,}.

If I understand you in the right way, I assume you just want to get all Registrykeys with two values of a *.reg file ...

So, try this ...

$arResult = StringRegExp($sRegFile, '(\[HKEY.*?\]\r\n".*?"="?.*?"?\r\n".*?"="?.*?"?\r\n\r)', 3)
_ArrayDisplay($arResult, 'matching strings')oÝ÷ ØêâØ^Á¬éíy²«-¦-ë-Â+a"xy§_ºw-ìjëh×6Dim $sMatch
$arSectionNames = IniReadSectionNames($pathRegFile)
For $i = 1 To $arSectionNames[0]
    $arSections = IniReadSection($pathRegFile, $arSectionNames[$i])
    If $arSections[0][0] = 2 Then $sMatch &= $arSectionNames[$i] & '|'
Next

If $sMatch Then
    $arMatchingSections = StringSplit(StringTrimRight($sMatch, 1), '|')
EndIf

_ArrayDisplay($arMatchingSections, 'matching sections containing two values')

Greetz

Greenhorn

Share this post


Link to post
Share on other sites

@SmOke_N has a nice try in post 10.

I had forgot about the ? after a repeating char. So a lesson learned there. As @weaponx points out in post #12 @SmOke_N's pattern does not catch the last as the pattern requires a ending CRLF to meet the pattern.

@SmOke_N provided a new pattern in post #14. This pattern works almost perfect. Thanks @SmOke_N, I will work with that a bit..:) The only imperfect thing is that I wanted to extract keys with two or more values. Thats what I thought I could use (ab)* for as described in the help file. I realise I could be askin to much on this point.

@Greenhorn has posted a pattern in post #16 that extracts only keys with two values in it.

I'll study all the patterns a bit more later to day and see if I can improve on them.

Thanks for your interest and partisipation.

happy Scripting

Uten

Share this post


Link to post
Share on other sites

@SmOke_N provided a new pattern in post #14. This pattern works almost perfect. Thanks @SmOke_N, I will work with that a bit..:)The only imperfect thing is that I wanted to extract keys with two or more values. Thats what I thought I could use (ab)* for as described in the help file. I realise I could be askin to much on this point.

Boy, this could get ugly:
'(?s)(?i)(\[[a-z0-9_\- \\]+\]\r\n"[a-z0-9]+"=[a-z0-9:,]*\r\n"[a-z0-9]+"=[a-z0-9:,]*)(?m:\z|\r\n\r\n)'

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Hi @SmOk_N, thanks for your effort. It is ugly kindof..:)

Could you or anyone else interested help me understand this part of the StringRegExp documentation:

( ... ) Group. The elements in the group are treated in order and can be repeated together. e.g. (ab)+ will match "ab" or "abab", but not "aba". A group will also store the text matched for use in back-references and in the array returned by the function, depending on flag value.

I thought this code only should return ababab:

Func testSubPattern()
    $data = '1ab 2abab 3ababab 4abab 5aba'
    $arr = StringRegExp($data, '(\d(ab){3,})',3)
    ArrDump($arr)
EndFuncoÝ÷ Ø Ýiû^®ËZ®Ø§í+"µÚ³^Þéz·¶¬zØ^v*ÞrدzÚÅÉnuëazË¥«mz¹âØ^­ën®wjºÚÉ*-+0¢¹,jëh×6Func testSubPattern()
    $data = '1ab 2abab 3ababab 4abab 5aba'
    $arr = StringRegExp($data, '(\d(?:ab){2,})',3)
    ArrDump($arr)
EndFunc

NOTE: I have not verified that I have included all legal possibilites inside the class definitions [\w\d\\ \.] and so forth.

Thanks for all the valuable input. Without it it would probably have taken a lot longer to figure it out..;)

Uten

Share this post


Link to post
Share on other sites

Thanks for all the valuable input. Without it it would probably have taken a lot longer to figure it out..:)

Hey, it's your fault I got hooked on RegExp anyway ;)

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0