Jump to content

[Solved] Split a repeating string pattern using StringRegExp

Recommended Posts

I am trying to split a string containing multiple GUIDs. What I am trying to achieve is to capture a certain string format, multiple times and return an array of matches. The string formats are described here and the format I am interested in is {BRMMmmmm-PPPP-LLLL-p000-D000000FF1CE}.

I have created a few functions but I am getting stuck when it comes to returning just the PPPP or LLLL or p parts. I would love to be able to split the individual B,R,MM,mmmm parts but I have settled for getting the whole string and can get the parts I need by trimming the string. What I have is below

#include <Array.au3>
#include <StringConstants.au3>

Local $s_Code = '{90140000-001C-0000-1000-0000000FF1CE};{90140000-0016-081A-1000-0000000FF1CE};{90140000-0016-0402-1000-0000000FF1CE}' ; could be 100+ product codes in the returned string
;~ Local $v_Return = _ProductCode_CheckFormat($s_Code)
;~ Local $v_Return = _ProductCode_GetAll($s_Code)
;~ Local $v_Return = _ProductCode_GetVersionFormat($s_Code)
Local $v_Return = _ProductCode_GetProductID($s_Code)
;~ Local $v_Return = _ProductCode_GetLanguage($s_Code)
;~ Local $v_Return = _ProductCode_GetPlatform($s_Code)

If IsArray($v_Return) Then
Else ; error or single value
    MsgBox(0, '', $v_Return)

Func _ProductCode_CheckFormat($s_Code) ; returns 1 or 0 if code is in the correct format (Only used on a single product code)
    Return StringRegExp($s_Code, '[[:digit:]]{8}-[[:xdigit:]]{4}-[[:digit:]]{4}-[[:digit:]]{4}-[[:xdigit:]]{12}')
EndFunc   ;==>_ProductCode_CheckFormat

Func _ProductCode_GetAll($s_Code) ; returns all the product codes (90140000-001C-0000-1000-0000000FF1CE) in the string
    Return StringRegExp($s_Code, '[^{};]+', $STR_REGEXPARRAYGLOBALMATCH)
EndFunc   ;==>_ProductCode_GetAll

Func _ProductCode_GetPlatform($s_Code) ; gets the format with the platform (p000) info (********-****-****-1***-************) just the p part

EndFunc   ;==>_ProductCode_GetPlatform

Func _ProductCode_GetProductID($s_Code) ; gets the format with the Product (PPPP) info (********-001C-****-****-************)
    Return StringRegExp($s_Code, '-[[:xdigit:]]{4}-', $STR_REGEXPARRAYGLOBALMATCH) ; split a product code
EndFunc   ;==>_ProductCode_GetProductID

Func _ProductCode_GetLanguage($s_Code) ; gets the format with the Language (LLLL) info (********-****-0000-****-************)

EndFunc   ;==>_ProductCode_GetLanguage

Func _ProductCode_GetVersionFormat($s_Code) ; gets the format with the version (BRMMmmmm) info (90140000-****-****-****-***********)
    Return StringRegExp($s_Code, '[[:digit:]]{8}', $STR_REGEXPARRAYGLOBALMATCH) ; split a product code
EndFunc   ;==>_ProductCode_GetVersionFormat

I am currently stuck on the _ProductCode_GetProductID function. I am trying to find a way to ignore the '-' that is returned with the string. and there should only be the strings that are returned in row 0, 2 and 4. The Language part should follow a similar pattern

Any ideas on how to accomplish the task. I am sure regexp is up to the task, But I am still trying to understand it.


Edited by benners
Link to post
Share on other sites

I can't wrap my head on the outcome. How will


be formatted when it's done how you want? Could you convert this manually and show what it will look like. From there it would be much easier to understand how to do it pragmatically.

Link to post
Share on other sites

I suggest the use of the not well-known flag 4 
Here is a rough example

#include <Array.au3>
#include <StringConstants.au3>

Local $s_Code = '{90140000-001C-0000-1000-0000000FF1CE};{90140000-0016-081A-1000-0000000FF1CE};{90140000-0016-0402-1000-0000000FF1CE}'

;$res = StringRegExp($s_Code, '(.)(.)(..)(....)-(....)-(....)-(.)(...)-(.)(.{11})', $STR_REGEXPARRAYGLOBALFULLMATCH)  ; flag 4

$res = StringRegExp($s_Code, '([[:xdigit:]])([[:xdigit:]])([[:xdigit:]]{2})([[:xdigit:]]{4})-([[:xdigit:]]{4})-([[:xdigit:]]{4})-([[:xdigit:]])([[:xdigit:]]{3})-([[:xdigit:]])([[:xdigit:]]{11})', $STR_REGEXPARRAYGLOBALFULLMATCH)

Local $list[UBound($res)]

For $i = 0 to UBound($res)-1
    $list[$i] = ($res[$i])[0]  ; get first element of each sub-array
    _ArrayDisplay($res[$i], "splitted result " & $i+1)
_ArrayDisplay($list, "GUID list")  
Msgbox(0,"", "Product Id in 2nd GUID is : " & ($res[1])[5] )
; etc

The purpose is to get an array of arrays. Each sub-array contains in element [0] the GUID (full match), and in other elements [1] to [10] (capturing groups) the wanted parts splitted as described in the link provided in post#1
All-in-one, sort of  :)

Edited by mikell
Link to post
Share on other sites

Here is an optional example of using StringRegExp with flag 3, with data put into a 2D array.
In the RE pattern, the first opening bracket pairs with the last closing bracket, so the first captured group is the entire GUID.

#include <Array.au3>
#include <StringConstants.au3>

Local $s_Code = '{90140000-001C-0000-1000-0000000FF1CE};{90140000-0016-081A-1000-0000000FF1CE};{90140000-0016-0402-1000-0000000FF1CE}'

Local $aRes = StringRegExp($s_Code, '(' & _                                   ; Start of 1st capture group.
        '([[:xdigit:]])([[:xdigit:]])([[:xdigit:]]{2})([[:xdigit:]]{4})-' & _ ; 2nd (1-xdigit), 3rd (1-xdigit), 4th (2-xdigits), 5th (4-xdigits) capture groups.
        '([[:xdigit:]]{4})-' & _                                              ; 6th (4-xdigits) capture group.
        '([[:xdigit:]]{4})-' & _                                              ; 7th (4-xdigits) capture group.
        '([[:xdigit:]])([[:xdigit:]]{3})-' & _                                ; 8th (1-xdigit), 9th (3-xdigits) capture groups.
        '([[:xdigit:]])([[:xdigit:]]{11})' & _                                ; 10th (1-xdigit), 11th (11-xdigits) capture groups.
        ')', _                                                                ; End of 1st capture group.

; Put data from 1D array, $aRes, into 2D array, $aRes2D, with 11 columns.
Local $iNum = UBound($aRes) / 11 ; Number of GUID's in code string, $s_Code. ($iNum will be number of lines in 2D array.)
Local $aRes2D[$iNum][11]
For $i = 0 To UBound($aRes) - 1
    $aRes2D[Int($i / 11)][Mod($i, 11)] = $aRes[$i]

MsgBox(0, "", "Product Id in 2nd GUID is : 0x" & $aRes2D[1][5])


Link to post
Share on other sites

Nice, mikell, you are the daddy :king: Thanks for that. I was initially going to make functions that returned separate arrays, but this saves a lot of time and lines of code. I can dump a few other functions by doing this now.

@InunoTaishou, as mikell mentioned, the idea was to use the functions to get all the specified properties from the passed string, use regexp and return an array.

The strings come from Office patch files and they denote what versions, product names and platforms the patch file was compatible with. For example, if there were 20 product codes, when _ProductCode_GetProductID was run, it would parse the string and somehow return only the parts of the string that related to the Product ID. In this case any hex characters that were at the position of the string where PPPP  starts and finishes. I would then split the array to a string and run another function to check if the Office application would be updated by this patch. Hope this makes sense.



Link to post
Share on other sites

Malkey, you sir are my hero. This is excellent. It was what I was trying to achieve with the individual functions but now it's in one array. I wish there was  a mike drop emoji :thumbsup:

I can now mark this as solved thanks to both of your excellent examples. I have never seen anything like this.

$aRes2D[Int($i / 11)][Mod($i, 11)] = $aRes[$i]


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...