Sign in to follow this  
Followers 0
guinness

Parsing a .md5 checksum file

15 posts in this topic

The following code is fully working. But my question is how would you have done the regular expression? I currently have two groups for the MD5 checksum and filepath >>

(w{32}) - Match a word with a value of 32 characters.

s* - Match the delimiter space-asterix.

(V+) - Match anything which isn't vertical space e.g. @LF or @CRLF.

Any suggestions for improvements are welcome.

#include <Array.au3>

Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1 *Example.exe' & @CRLF & _
        'e5c13b7e919d8a36c308cf25e11efef4 *Example.exe' & @CRLF & _
        '087102d22e8a313a3bc192f3fa6e19b6 *Example.exe' & @CRLF & _
        '0637924f96cc8243fe49d811cf603784 *Example.exe' & @CRLF & _
        'c43282ed2ce63a31d015dd1f6e98e1c6 *C:ExampleExample.exe' & @CRLF

; Create an array like the following:
; $aArray[0] = MD5 checksum
; $aArray[1] = FilePath
; $aArray[2] = MD5 checksum
; $aArray[3] = FilePath
; $aArray[n] = MD5 checksum
; $aArray[n + 1] = FilePath
Local $aSRE = StringRegExp($sMD5FileData, '(w{32})s*(V+)', 3)
_ArrayDisplay($aSRE, 'SRE: ' & @error)

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites



Depends. If this is just for parsing it's good, although I'd + the s because some implementations put two spaces to separate md5 from filepath. Also the asterisk might not be there. If it's also for validating then it's not strict enough.

((?i)[a-f0-9]{32}) - Match hexadecimal value of 32 characters, case-insensitive.
s+*?             - Match any number of delimiter spaces and optional asterisk.
(V+)              - Match anything which isn't vertical space e.g. @LF or @CRLF.

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hey dany,

Thanks for replying, I like the idea of ((?i)[a-f0-9]{32}), though in terms of optimisation isn't this better?

([A-Fa-f0-9]{32}) - Match hexadecimal value of 32 characters.

I should have mentioned that this is an example for parsing a md5 checksum file created by the application getmd5checker.com, which uses s* as the delimiter. The standard options for a delimiter are space-asterix, pipe and double-space, so if this is to be a general parser, perhaps the following is better >>

#include <Array.au3>

Local $aDelimiter[3] = [' *', '|', '  '], $iDelimiter = Random(0, 2, 1)
Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _
        'e5c13b7e919d8a36c308cf25e11efef4' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _
        '087102d22e8a313a3bc192f3fa6e19b6' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _
        '0637924f96cc8243fe49d811cf603784' & $aDelimiter[$iDelimiter] & 'Example.exe' & @CRLF & _
        'c43282ed2ce63a31d015dd1f6e98e1c6' & $aDelimiter[$iDelimiter] & 'C:ExampleExample.exe' & @CRLF
ConsoleWrite('Delimiter option was - ' & StringReplace($aDelimiter[$iDelimiter], ' ', '<space>') & @CRLF)

; Create an array like the following:
; $aArray[0] = MD5 checksum
; $aArray[1] = FilePath
; $aArray[2] = MD5 checksum
; $aArray[3] = FilePath
; $aArray[n] = MD5 checksum
; $aArray[n + 1] = FilePath
Local $aSRE = StringRegExp($sMD5FileData, '([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)', 3) ; Or is this better? ([A-Fa-f0-9]{32})s{0,2}(?:*||)?(V+)
_ArrayDisplay($aSRE, 'SRE: ' & @error)
Edited by guinness

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

([A-Fa-f0-9]{32}) - Match hexadecimal value of 32 characters.

Yea that's optimized imho, I imagine that the regexp machinery does this internally anyway. It's also less ambiguous when reading code.

s means [ trn], so maybe use ' [ |*]'. And I wouldn't have used * quantifiers or ? but +, you know they're there and you want them, so demand them.

([A-Fa-f0-9]{32}) [ |*](V+)
([A-Fa-f0-9]{32})  - Match hexadecimal value of 32 characters, case-insensitive.
 [ |*]             - Match 2 delimiters: spaces or space with asterisk or pipe.
(V+)              - Match anything which isn't vertical space e.g. @LF or @CRLF.

; Or is this better? ([A-Fa-f0-9]{32})s{0,2}(?:*||)?(V+)

Character classes are a little faster than alternating groups and in this case they're a better choice imho. Sometimes you can't do without alternating groups but they can be quite costly:

$sRegExp = '(autoit|aut2exe|au3check)' ; Ignoring actual casing in favor of example.
$sTest = 'au3check'

$sTest will be checked against 'autoit' first. Index 1 is a match. Index 2 as well. Index 3 fails. Back to index 1 and start matching against 'aut2exe'. Again matches up to index 2 but fails at index 3. Back to index 1 and test against 'au3check'. Now matches up to last index. Return True.

This pattern would be marginally better:

$sRegExp = 'au(3check|toit|t2exe)' ; Also note reordering to fail sooner.

As I said, sometimes you can't do without them, but if you just want to match a single position then character classes are much faster:

$sRegExp = 'gr(a|e)y'
$sRegExp = 'gr[ae]y' ; Optimized.

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

You could also use ([[:xdigit:]]{32}) I think. It might be slightly faster since it is a predefined character class.


*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites

s means [ trn], so maybe use ' [ |*]'. And I wouldn't have used * quantifiers or ? but +, you know they're there and you want them, so demand them.

Good tip, though I need the quantifiers + for the class, otherwise the array returns incorrect values.

$sTest will be checked against 'autoit' first. Index 1 is a match. Index 2 as well. Index 3 fails. Back to index 1 and start matching against 'aut2exe'. Again matches up to index 2 but fails at index 3. Back to index 1 and test against 'au3check'. Now matches up to last index. Return True.

This pattern would be marginally better:

Thanks for the lesson. It's one thing I'm starting to learn with regular expressions, optimisation. Just like in AutoIt it takes time to know what is good and what is not OR if it's worth over complicating something just for a couple of milliseconds.

You could also use ([[:xdigit:]]{32}) I think. It might be slightly faster since it is a predefined character class.

I completely forgot about that character class. Thanks.

I ran the following regular expression through a loop 1000 times and then divided the time by a 1000. This was the results.

0.0405123961529231 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)

0.0374282045654834 - ([A-Fa-f0-9]{32})[ *|](V+)

0.0377551973027254 - ([[:xdigit:]]{32})[ *|](V+)


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

You hit one subtle barrier. Yes, your timing is probably correct, but PCRE works in two passes (which are hidden in the current implementation of regexp support within AutoIt). First PCRE compiles the expression then it runs it. If you had access to a handle of the compiled expression (supposedly to be run 100 or so times) you might well find that different patterns run at various speed once compiled.

Pattern compilation is what eats most of the time, unless it is applied to a huge string (e.g. large text file).

A similar two-step occurs whith SQLite, where queries are first compiled into a runable VM (yes, a virtual machine) then only run for real.

In both PCRE and SQLite demanding applications, common practice is to compile the expression or query once, then run it as many times as needed inside the application. This can be done for SQLite (albeit not recommended for beginners) but current AutoIt doesn't expose PCRE 2-steps bahavior, hence forcing a recompile of the pattern at every invokation.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Good tip, though I need the quantifiers + for the class, otherwise the array returns incorrect values.

What I said only applied to this little bit here between the capturing groups, "s{0,2}[*|]*". It will also match an empty string. Or in your case it would match a sha256 checksum, delimiters and filename as well. The sha256 checksum would be split at one-fourth and the rest would be in the second array index along with the whitespace and the filename. That's why I said if you know what to expect at an index demand it in your regexp.

Also you're using quote rather than code and it looks like you're missing a space:

([A-Fa-f0-9]{32}) [ *|](V+)
                 ^ whitespace!

edit: typo

Edited by dany

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Also you're using quote rather than code and it looks like you're missing a space:

([A-Fa-f0-9]{32}) [ *|](V+)
                 ^ whitespace!

That extra space should be in the group really, as the pipe delimiter will fail, due to it being formatted as MD5|FILEPATH (no whitespace.)

Final results and regular exp<b></b>ressions.

0.257127732943223 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)
0.408735544990339 - ([A-Fa-f0-9]{32})[  *|](V+)
0.455178289246338 - ([[:xdigit:]]{32})[  *|](V+)

dany,

Your help and input has been very much appreciated. It proves I'm learning everyday and willing to accept when I don't know something.

You hit one subtle barrier. Yes, your timing is probably correct, but PCRE works in two passes (which are hidden in the current implementation of regexp support within AutoIt). First PCRE compiles the exp<b></b>ression then it runs it. If you had access to a handle of the compiled exp<b></b>ression (supposedly to be run 100 or so times) you might well find that different patterns run at various speed once compiled.

Pattern compilation is what eats most of the time, unless it is applied to a huge string (e.g. large text file).

A similar two-step occurs whith SQLite, where queries are first compiled into a runable VM (yes, a virtual machine) then only run for real.

In both PCRE and SQLite demanding applications, common practice is to compile the exp<b></b>ression or query once, then run it as many times as needed inside the application. This can be done for SQLite (albeit not recommended for beginners) but current AutoIt doesn't expose PCRE 2-steps bahavior, hence forcing a recompile of the pattern at every invokation.

Suffice to say your input and knowledge has increased my understanding of how AutoIt processes regular exp<b></b>ressions. Edited by guinness

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

That extra space should be in the group really, as the pipe delimiter will fail, due to it being formatted as MD5|FILEPATH (no whitespace.)

Ok, but this is not a group but a character range. [aa] is the same as [a] or [aaa], it will match a single a only. I'm rather curious about your actual matches because as I see it, the space will be matched moving the engine on to the next part in the pattern, the (V+), which will then gobble up the second space or asterisk. It wouldn't exactly fail but it isn't the result you want. I would suggest this:

([A-Fa-f0-9]{32}) ?[ *|](V+) ; make it optional.

Coming from an MVP i'm rather humbled and also glad my input isn't wasted. On the other hand...

Final results and regular exp<b></b>ressions.

0.257127732943223 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)
0.408735544990339 - ([A-Fa-f0-9]{32})[  *|](V+)
0.455178289246338 - ([[:xdigit:]]{32})[  *|](V+)

Funny how the first exp<b></b>ression is now the fastest where it was the slowest...

edit: Yea, jchd is absolutely right. That will slow down things, something you can't optimize against.

Edited by dany

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

#include <Array.au3>

Local $sMD5FileData = '405b5ad0eb79a9736d6bf8a76c0153f1 *Example.exe' & @CRLF & _
        'e5c13b7e919d8a36c308cf25e11efef4 *Example.exe' & @CRLF & _
        '087102d22e8a313a3bc192f3fa6e19b6 *Example.exe' & @CRLF & _
        '0637924f96cc8243fe49d811cf603784 *Example.exe' & @CRLF & _
        'c43282ed2ce63a31d015dd1f6e98e1c6 *C:ExampleExample.exe' & @CRLF
        ; MsgBox(0, 'Сообщение', $sMD5FileData )

$timer = TimerInit()
For $i = 1 To 10000
    Local $aSRE = StringRegExp($sMD5FileData, '([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)', 3)
Next
MsgBox(0, "???", 'time : ' & Round(TimerDiff($timer), 2) & ' msec')
_ArrayDisplay($aSRE, 'SRE: ' & @error)

Local $aSRE = StringRegExp($sMD5FileData, '(?m)^([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)r?$', 3)
Local $aSRE = StringRegExp($sMD5FileData, '(?m)^(.{32})h+*(.+)r$', 3)
Edited by AZJIO

Share this post


Link to post
Share on other sites

Thanks AZJIO. I just think it goes to show there are plenty of ways to go about this.

Ok, but this is not a group but a character range. [aa] is the same as [a] or [aaa], it will match a single a only. I'm rather curious about your actual matches because as I see it, the space will be matched moving the engine on to the next part in the pattern, the (V+), which will then gobble up the second space or asterisk. It wouldn't exactly fail but it isn't the result you want. I would suggest this:

([A-Fa-f0-9]{32}) ?[ *|](V+) ; make it optional.
Of course, making it optional. Duh! Sometimes the obvious things are the obvious things. I also see your point about the character range, but it did match when I used a double space.

Coming from an MVP i'm rather humbled and also glad my input isn't wasted. On the other hand...

Well credit where credit is due.

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

dany,

I was wrong, you were right, don't know what I saw last night but the double-space in the character class didn't work. Learnt something today, always double check.

Edit: Here are the results, the last results were wrong because I didn't reset the timer!

Delimiter option was - <space><space>

0.0425633102418133 - ([A-Fa-f0-9]{32})s{0,2}[*|]*(V+)

0.044208049495654 - ([A-Fa-f0-9]{32}) ?[ *|](V+)

0.0386496617409211 - ([[:xdigit:]]{32}) ?[ *|](V+)

0.0430682183099314 - ([[:xdigit:]]{32})h*[ *|]([^*?/|"<>[:cntrl:]]+)

Edited by guinness

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

0.0199695295556956 - ([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)

This pattern only accepts at least one space and one star as delimiter. Replace h+* with h*[ *|] for horizontal whitespace followed by space, star or pipe.

Edited by ProgAndy

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

0.0199695295556956 - ([[:xdigit:]]{32})h+*([^*?/|"<>[:cntrl:]]+)[code=auto:0]

This pattern only accepts at least one space and one star as delimiter. Replace h+* with h*[ *|] for horizontal whitespace followed by space, star or pipe.

I didn't check the output, so didn't notice this and took it at face value. Cheers for that.

Edit: I changed the results above, seems you were right ProgAndy that the hexadecimal character class is an optimised approach.

Edited by guinness

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0