Sign in to follow this  
Followers 0
nikink

Need help parsing strings

14 posts in this topic

#1 ·  Posted (edited)

Hi folks, I have taken on a little project that requires the parsing of certain strings into mandatory and optional components.

So given a test string "aaaa(bbb)c(d)(e)" the optional components are between the parentheses, and the mandatory are the other bits.

I need the output to be every combination of mandatory+optional components (without parentheses).

Thus for my test string the output would be:

"aaaac"

"aaaabbbcde"

"aaaabbbcd"

"aaaabbbce"

"aaaacde"

"aaaacd"

"aaaace"

"aaaabbb"

#include <array.au3>

Global $var = "st(pw)n(ta)b"
Global $optionalvar, $testforerror
Global $results[1]

; Need to get all optional parameters out... all those params between each pair of ( and )
If StringInStr($var, "(") Then
    
    $optionalvar = StringTrimLeft($var,StringInStr($var, "(")) 
    ConsoleWrite("Found a ( : " & $optionalvar  & @CR)
    If StringInStr($optionalvar, ")") Then
        $optionalvar = StringTrimRight($optionalvar,(StringLen($var) - StringInStr($var, ")")+1)) 
        ConsoleWrite("Found matching )... $optionalvar = " & $optionalvar & @CR)
        If StringInStr($optionalvar, ")") Or StringInStr($optionalvar, "(") Then
            ConsoleWrite("Error! Mismatching parenthesis in $optionalvar: " & $optionalvar & @CR)
            Exit
        EndIf
    Else
        ConsoleWrite("Parsing error! NO ) FOUND!" & @CR)
        Exit
    EndIf

Else
    If StringInStr($var, ")") Then
        ConsoleWrite("Parsing error! FOUND ), did not find ( to match!" & @CR)
        Exit
    EndIf
    ConsoleWrite("$optionalvar = " & $optionalvar & @CR) 
EndIf

For $i = 1 To Ubound($results)-1
    ConsoleWrite($results[$i] & @CR)
Next

But I'm getting in a bit over my head and could use some help or advice. Note I'm also trying to validate the string so that "aaaa(bb(:)" or "aaaa(bb)^_^" (mismatched brackets in other words) are not accepted / error out nicely.

I figured storing the results in an array is the sensible option (even if it requires a lot of ReDimming to collect all the permutations, but am open to other suggestions if someone can see an easier / more efficient way.

Can anyone help me?

Edited by nikink

Share this post


Link to post
Share on other sites



If I were you, I would divide data in the script using commas (or any other delimiter) like:

$var = "aaaa,bbb,c,d,e"

If you want to omit a component, you may add double commas:

$var = "aaaa,,c,,"

After that you may StringSplit your variable using the comma delimiter and in this way you will know if an optional parameter was given (not empty) or not (empty).

It's all about string manipulation, check StringSplit in the helpfile and obey some rules in the variable syntax so as your script work correctly.

Share this post


Link to post
Share on other sites

This version will test for matched parens, and then split the input into parts as you described:

#include <array.au3>

Global $var = "st(pw)n(ta)b"
Global $optionalvar, $testforerror
Global $results[1]
Global $fMatchError = False

; Test for matched parens.  Does not allow nested parens.
$sTest = $var
While 1
    $iLParen = StringInStr($sTest, "(")
    If $iLParen Then
        $sTest = StringTrimLeft($sTest, $iLParen)
        $iLParen = StringInStr($sTest, "(")
        $iRParen = StringInStr($sTest, ")")
        If ($iRParen = 0) Or (($iLParen > 0) And ($iLParen < $iRParen)) Then
            $fMatchError = True
            ExitLoop
        Else
            $sTest = StringTrimLeft($sTest, $iRParen)
        EndIf
    Else
        ExitLoop
    EndIf
WEnd
If $fMatchError Or StringInStr($sTest, ")") Then
    MsgBox(16, "Error", "Input string has mismatched parens:  " & $var)
    Exit
EndIf

; Split sting into parts
$results = StringSplit($var, "()", 0)
For $i = 1 To UBound($results) - 1
    ConsoleWrite("Debug: $results[" & $i & "] = " & $results[$i] & @LF)
Next

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Awesome PasltyDS! Thanks!

I notice your solution doesn't pick up a stray ), but but that's a (probably) simple tweak... now all I gotta do is work out how to create all the permutations of mandatory and optional components. It's a frustratingly 'simple' task... Seems so straightforward, but whenever I attempt to code a solution I end up in a mess... B-)

For Each Instance Of Optional Component 1

For Each Instance of Optional Component 2

For Each Instance Of Optional Component n-1

For Each Instance Of Optional Component n

Return String

Each Instance Of Optional Component = "string" and ""

So maybe Mandatory components = "string" and "string"

Then the Results Array in PsaltyDS' script could be an array of 2 element arrays...

And some kind of recursion function could cycle through the Results Array elements...

Hmmm. Any thoughts from you clever AutoIters? Hints, tips, advice all welcome here! :)

Share this post


Link to post
Share on other sites

I notice your solution doesn't pick up a stray ), but but that's a (probably) simple tweak...

It picks up stray right parens in my testing. Can you post a value for $var that doesn't work?

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Sure,

Global $var = "st(pw))n(ta)b"

Debug: $results[1] = st

Debug: $results[2] = pw

Debug: $results[3] =

Debug: $results[4] = n

Debug: $results[5] = ta

Debug: $results[6] = b

:)

I was thinking perhaps do a check that there are equal numbers of Left and Right Parentheses before anything else, because your script finds a mismatched ( which would account for misformed input strings like "st(pw))n(tab".

Share this post


Link to post
Share on other sites

Sure enough, thanks for the example. This adds exactly that check:

#include <array.au3>

Global $var = "st(pw))n(ta)b"
Global $optionalvar, $testforerror
Global $results[1]
Global $fMatchError = False

; Test for same number of L/R parens
$avL = StringSplit($var, "(")
$avR = StringSplit($var, ")")
If $avL[0] = $avR[0] Then
    ; Test for matched parens.  Does not allow nested parens.
    $sTest = $var
    While 1
        $iLParen = StringInStr($sTest, "(")
        If $iLParen Then
            $sTest = StringTrimLeft($sTest, $iLParen)
            $iLParen = StringInStr($sTest, "(")
            $iRParen = StringInStr($sTest, ")")
            If ($iRParen = 0) Or (($iLParen > 0) And ($iLParen < $iRParen)) Then
                $fMatchError = True
                ExitLoop
            Else
                $sTest = StringTrimLeft($sTest, $iRParen)
            EndIf
        Else
            ExitLoop
        EndIf
    WEnd
Else
    $fMatchError = True
EndIf

If $fMatchError Or StringInStr($sTest, ")") Then
    MsgBox(16, "Error", "Input string has mismatched parens:  " & $var)
    Exit
EndIf

; Split sting into parts
$results = StringSplit($var, "()", 0)
For $i = 1 To UBound($results) - 1
    ConsoleWrite("Debug: $results[" & $i & "] = " & $results[$i] & @LF)
Next

I don't think this is optimal, and I keep thinking there is a clever RegExp out there for it. But my RegExp-Fu is not strong enough.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Yeah, I would've thought so too... But my regex-fu is practically non-existant! :)

Now, does anyone have any ideas on how to generate all the combinations of elements?

Anyone? Anyone? Bueller?

Share this post


Link to post
Share on other sites

Yeah, I would've thought so too... But my regex-fu is practically non-existant! :)

Now, does anyone have any ideas on how to generate all the combinations of elements?

Anyone? Anyone? Bueller?

Generating every possible combination of a set of n elements is an interesting programming problem. Here's my shot at it:

;aaaa(bbb)c(d)e
Global $results[6] = [5, "aaaa", "bbb", "c", "d", "e"]
Global $sOut = ""
Global $iMax = 2^$results[0] - 1
ConsoleWrite("Debug: There will be " & $iMax & " results." & @LF)
For $n = 1 To $iMax
    $sOut &= $n & ":  "
    For $b = 0 To $results[0] - 1
        If BitAND($n, 2^$B) Then $sOut &= $results[$b + 1]
    Next
    $sOut &= @CRLF  
Next
MsgBox(64, "Results", $sOut)

Cheers!

^_^


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

That's... umm... very cool, and clever... but I've been looking at it for a day now, and can't understand what's happening...

Can you (or someone) explain it to me?

I guess it's the BitAND that's throwing me, I don't understand what that's doing, what its purpose is...

Thanks though, for all your help so far, your scripts are a lot more efficient than mine (and more importantly, they work! :) )

Share this post


Link to post
Share on other sites

That's... umm... very cool, and clever... but I've been looking at it for a day now, and can't understand what's happening...

Can you (or someone) explain it to me?

I guess it's the BitAND that's throwing me, I don't understand what that's doing, what its purpose is...

Thanks though, for all your help so far, your scripts are a lot more efficient than mine (and more importantly, they work! :) )

The loop is based on the fact that you can associate each element with a bit in a binary number. By incrementing a binary number from 0 to all bits set, you produce every possible combination of those bits. But, we don't want 0 because that means no bits set, no element selected, so we start from 1 instead.

; Create an array to simulate the output of the earlier function
Global $results[6] = [5, "aaaa", "bbb", "c", "d", "e"]

; Declare a variable to hold the output string
Global $sOut = ""

; Create a binary number representing 1 bit set for each element in the set.
; The formula is 2 to the nth power, minus one.
; In this case 2^5 - 1 = 31, which in binary is 11111 (five ones).
Global $iMax = 2^$results[0] - 1

; Since all zeroes (no element) is not an option, there going to be 2^n - 1 results, too.
ConsoleWrite("Debug: There will be " & $iMax & " results." & @LF)

; A loop to increment a number from 1 to $iMax (31 in this case).  This series of numbers
; will represent every possible combination of bits set, and since we associate an element in the set 
; with each bit, every possible combination of elements.
For $n = 1 To $iMax
    ; String format for the line, i.e. "1: " thru "31: "
   $sOut &= $n & ":  "
   
   ; For each value of $n, test the bits to see which elements to include
   ; In this case the 5 bits are 0 thru 4
    For $b = 0 To $results[0] - 1
        ; Test the bit with a logical AND, if the bit is set, add that element to the output line
        If BitAND($n, 2^$B) Then $sOut &= $results[$b + 1]
    Next
    
    ; End the current line before going on the the next $n
    $sOut &= @CRLF 
Next

; After the loop is done, display the results
MsgBox(64, "Results", $sOut)

^_^


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Ok, I think I've got it all working. I'm sure it can be optimised though... :)

This problem has been driving me up the wall for weeks now.

This messy script will generate every combination of mandatory and optional components given a string of characters where optional characters are within parentheses and every result contains all mandatory components. All combinations of mandatory and optional components remain in order. The input string can be of any length and any combination of mandatory and optional components - but the more optional components listed, the slower the script (cuz of all the array juggling and combination generating).

I'm sure there should be a faster way... and I know I've confused myself numerous times during this project, as I'm sure someone with fresh eyes, and better scripting skills will see immediately upon perusal! ^_^ Comments and critiques and suggestions for improvement very welcome. Very very welcome!

Ideally I'd love to get rid of the array juggling and do more with regex... anyway:

include <array.au3>

Global $var = "(0)1(0)1(0)1(0)(0)(0)1"
Global $optionalvar, $testforerror
Global $results[1]
Global $fMatchError = False
#region - test for paretheses validity - Thanks to PsaltyDS for this
; Test for same number of L/R parens
$avL = StringSplit($var, "(")
$avR = StringSplit($var, ")")
If $avL[0] = $avR[0] Then
    ; Test for matched parens.  Does not allow nested parens.
    $sTest = $var
    While 1
        $iLParen = StringInStr($sTest, "(")
        If $iLParen Then
            $sTest = StringTrimLeft($sTest, $iLParen)
            $iLParen = StringInStr($sTest, "(")
            $iRParen = StringInStr($sTest, ")")
            If ($iRParen = 0) Or (($iLParen > 0) And ($iLParen < $iRParen)) Then
                $fMatchError = True
                ExitLoop
            Else
                $sTest = StringTrimLeft($sTest, $iRParen)
            EndIf
        Else
            ExitLoop
        EndIf
    WEnd
Else
    $fMatchError = True
EndIf

If $fMatchError Or StringInStr($sTest, ")") Then
    MsgBox(16, "Error", "Input string has mismatched parens:  " & $var)
    Exit
EndIf
#endregion

; Split sting into parts
$results = StringSplit($var, "()", 0)
Global $NumMandatoryComponents = Round((Ubound($results)/2))

For $i = 1 To UBound($results) - 1
    If Not mod($i, 2) = 0 Then
        $results[$i] = "<M>" & $results[$i]
    Else
        $results[$i] = "<O>" & $results[$i]
    EndIf
Next

Global $sOut = ""
Global $iMax = 2^$results[0] - 1
ConsoleWrite("Debug: There will be " & $iMax & " results." & @LF)


; Thanks to PsaltyDS for this construction
For $n = 1 To $iMax
    For $b = 0 To $results[0] - 1
        If BitAND($n, 2^$B) Then $sOut &= $results[$b + 1]; working code. nice.
    Next
    $sOut &= @LF ;@CRLF 
Next

Global $arraySOut = StringSplit($sOut, @LF)
Global $tempresult = ""


For $i = 1 To Ubound($arraySOut) - 1
    $compare = StringRegExp($arraySOut[$i], "<M>", 3)

    If $NumMandatoryComponents = Ubound($compare) Then 
        $tempresult = StringRegExpReplace($tempresult, "<M>|<O>", "")
        ConsoleWrite("Debug: " & $i & ":" & $tempresult & @LF)
    EndIf
Next

If anyone can see ways to streamline this, or make it faster, that would be very much appreciated!

And thanks to PsaltyDS for his/her input!

(Edited to show improvements in efficiency I could find after a nights sleep)

Edited by nikink

Share this post


Link to post
Share on other sites

And here it is commented:

#include <array.au3>

Global $var = "(1)(1)(1)(1)m"
Global $optionalvar, $testforerror
Global $results[1]
Global $fMatchError = False
#region - test for paretheses validity - Thanks to PsaltyDS for this
; Test for same number of L/R parens
$avL = StringSplit($var, "(")
$avR = StringSplit($var, ")")
If $avL[0] = $avR[0] Then
    ; Test for matched parens.  Does not allow nested parens.
    $sTest = $var
    While 1
        $iLParen = StringInStr($sTest, "(")
        If $iLParen Then
            $sTest = StringTrimLeft($sTest, $iLParen)
            $iLParen = StringInStr($sTest, "(")
            $iRParen = StringInStr($sTest, ")")
            If ($iRParen = 0) Or (($iLParen > 0) And ($iLParen < $iRParen)) Then
                $fMatchError = True
                ExitLoop
            Else
                $sTest = StringTrimLeft($sTest, $iRParen)
            EndIf
        Else
            ExitLoop
        EndIf
    WEnd
Else
    $fMatchError = True
EndIf

If $fMatchError Or StringInStr($sTest, ")") Then
    MsgBox(16, "Error", "Input string has mismatched parens:  " & $var)
    Exit
EndIf
#endregion
; Split sting into parts
$results = StringSplit($var, "()", 0)
Global $NumMandatoryComponents = Round((Ubound($results)/2))

For $i = 1 To UBound($results) - 1 ; Mark each component as either Mandatory or Optional
    If Not mod($i, 2) = 0 Then  ; Mandatory components always fall in Odd indexes of $results
        $results[$i] = "<M>" & $results[$i]
    Else                        ; Optional components always fall in Even indexes of $results
        $results[$i] = "<O>" & $results[$i]
    EndIf
Next

Global $sOut = "" ; create a string to hold the marked results
Global $iMax = 2^$results[0] - 1 ; This is the total number of combinations of the results, the number of *valid* results is 2^(number of optional components)
;ConsoleWrite("Debug: There will be " & $iMax & " results." & @LF)

; Thanks to PsaltyDS for this construction. Generates every combination of the components in $results and puts them in a @LF delimited string
For $n = 1 To $iMax
    For $b = 0 To $results[0] - 1
        If BitAND($n, 2^$B) Then $sOut &= $results[$b + 1]; working code. nice. 
    Next
    $sOut &= @LF 
Next

; Split the @LF delimited string of results into an array containing strings of Marked Components. This array is $iMax in size!
Global $arraySOut = StringSplit($sOut, @LF) 

For $i = 1 To Ubound($arraySOut) - 1
    ; $compare is the array of Mandatory components found by Regex looking for "<M>" upon each element within $arraySOut
    $compare = StringRegExp($arraySOut[$i], "<M>", 3) 
    If $NumMandatoryComponents = Ubound($compare) Then 
        ; If the regex returns an array of Mandatory components where the size equals the Number of Mandatory components then a valid combination has been found.
        $arraySOut[$i] = StringRegExpReplace($arraySOut[$i], "<M>|<O>", "") ; Strip the implanted markings
        ConsoleWrite("Debug: " & $i & ":" & $arraySOut[$i] & @LF) ; Output
    EndIf   
Next

A single character to mark mandatory and optional would probably be a bit faster - and preferably non printable, as the string of characters could be ANY characters in theory. Thus it's not likely but I suppose possible that the combination of <M> or <O> could actually be part of the string.

Share this post


Link to post
Share on other sites

And thanks to PsaltyDS for his/her input!

He/she says "You're welcome." :)

And here it is commented:

A single character to mark mandatory and optional would probably be a bit faster - and preferably non printable, as the string of characters could be ANY characters in theory. Thus it's not likely but I suppose possible that the combination of <M> or <O> could actually be part of the string.

Glad you got it working. Merry Christmas!

^_^


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0