Sign in to follow this  
Followers 0
Pumbaa

How to do StringSplit when delimiter is included in split parts

18 posts in this topic

Hi All!

I need to split into parts string like below with comma delimiter:

text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5

As you see 3rd field includes text selflimited with { and } signs. Such constructions can appear in any field & not just once per string or even field, it also can be all the content of a field.

Using simple StringSplit gives me wrong result. I've tried to make StringSplit first based on { & } signs and then after analysing result StringSplit to parts, which are not between {}, but that's rather messy method.

I was thinking of replacing "right" commas with, for example, @ to use regular StringSplit after that. May be RexExp functions could be useful here, but I'm not too familiar with them to solve my problem.

Any suggestions?

Share this post


Link to post
Share on other sites



Recommendations.

1. I recommend you read the specs for csv format.

2. use one of the csv scripts in example scripts. Search the forum for

3. don't use commas within fields or use Chr(130) instead

4.Use a different delimeter such as semicolon or TAB

5. you may be able to create a regular expression if you can identify a pattern.

Share this post


Link to post
Share on other sites

Hope this helps.

#include <Array.au3>
Local $sString = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
; Create an array of each instance of the text between "{" and "}".
Local $aArray = StringRegExp($sString, "\{(.*)\}", 3)
; Replace all occurrances of "{..text in between..}" with a coma.
Local $sNewString = StringRegExpReplace($sString, "(\{.*\})", ",")
If StringRight($sNewString, 1) = "," Then $sNewString = StringTrimRight($sNewString, 1) ; Delete trailing coma if one exists.
Local $aArray2 = StringSplit($sNewString, ",", 2)
Local $iUbndA2 = UBound($aArray2)
;Join arrays
ReDim $aArray2[$iUbndA2 + UBound($aArray)]
For $i = $iUbndA2 To UBound($aArray2) - 1
$aArray2[$i] = $aArray[$i - $iUbndA2]
Next
_ArrayDisplay($aArray2)

Share this post


Link to post
Share on other sites

To czrdas: Unfortunatly I work with externally predicted files & string structures in them, so leading them to csv format & similar recommendations are out of my reach. I've wrote about possible patterns & willingness of RegEx usage, but lack of their understanding & expirience. Thanks anyway. CSV UDF could be useful in future.

To Malkey: Thanks, I'll try to develop your offer.

Examples & suggestions are still acceptable.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Perhaps this will help or give you some ideas.

#include <Array.au3>
Local $sString = "text0,{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
; Create an array of each instance of the text between "{" and "}".
Local $sReplacement = ",", $sTemp = $sString ; In case you need the original string later.

Local $aArray = StringRegExp($sString, "{[^}]*", 3)
;_ArrayDisplay($aArray)

If IsArray($aArray) Then ; Added error check!
    For $i = 255 To 1 Step -1 ; Search for a replacement character.
        If Not StringInStr($sString,Chr($i)) Then ExitLoop
    Next
    If $i = 0 Then Exit ; In the most unlikely event that no suitable delimeter found.

    Local $sReplacement = Chr($i)
    For $i = 0 To UBound($aArray) -1 ; Replace the commas we wish to ignore
        $sTemp = StringReplace($sTemp, $aArray[$i], StringReplace($aArray[$i], ",", $sReplacement))
    Next
EndIf

$aArray = StringSplit($sTemp, ",", 2) ; Might as well use the same array name again.
For $i = 0 To UBound($aArray) -1 ; Put the removed commas back.
    $aArray[$i] = StringReplace($aArray[$i], $sReplacement, ",")
Next
_ArrayDisplay($aArray)

Edit

Added an error check to the code.

Edited by czardas

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Good. Using RegEx with "{[^}]*" finally gives me smth like this:

#include <Array.au3>

Local $sString = "text0,{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $ExprArray = StringRegExp ($sString, "{[^}]*", 3) ; array of all {formulas} in string
Local $j = @error
$sString = StringRegExpReplace ($sString, "{[^}]*", "{") ; replacment of all {formula} in string with "{"
Local $FinalArray = StringSplit ($sString, ",")
If $j = 0 Then ; if {formula} existed then recover them
    For $i = 0 To $FinalArray [0]
        If StringInStr ($FinalArray [$i], "}") > 0 Then
            $FinalArray [$i] = StringReplace ($FinalArray [$i], "{}", $ExprArray [$j] & "}", 1)
            $j = $j + 1
        EndIf
    Next
EndIf

_ArrayDisplay ($FinalArray)

It seems to work with all possible variants.

The only thing which remains to understand myself is what "{[^}]*" really means.

RegEx rules :)

Thanks.

Edited by Pumbaa

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Ha I introduced a bug when I made changes to the above code. It should be okay now.

The only thing which remains to understand myself is what "{[^}]*" really means.

This is easy to pick apart.

{ = Find pattern starting with an opening curly bracket (followed by)

[^ = a character which is not

} = a closing curly bracket

]* = which may or may not appear and may also repeat

Edited by czardas

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Yes, I've got it. Tried some more combinations, but yours is the most useful. Thanks again.

Upgraded my script to take in consideration few appearances of {formula} in one field.

#include <Array.au3>

Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $ExprArray = StringRegExp ($sString, "{[^}]*", 3) ; array of all {formulas} in string
Local $j = @error
$sString = StringRegExpReplace ($sString, "{[^}]*", "{") ; replacement of all {formula} in string "{"
Local $FinalArray = StringSplit ($sString, ",")
Local $NumOfExpr [1], $k
If $j = 0 Then ; if {formula} existed, then recover them
    For $i = 1 To $FinalArray [0]
        If StringInStr ($FinalArray [$i], "}") > 0 Then
            $NumOfExpr = StringSplit ($FinalArray [$i], "{}", 1) ; if more then 1 {formula} in field
            If $NumOfExpr [0] > 1 Then
                For $k = 1 To $NumOfExpr [0] - 1
                    $FinalArray [$i] = StringReplace ($FinalArray [$i], "{}", $ExprArray [$j] & "}", 1)
                    $j = $j + 1
                Next
            EndIf
        EndIf
    Next
EndIf

_ArrayDisplay ($FinalArray)
Edited by Pumbaa

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Well... after taking care of all damn {formula} possible appearances my old code seems to be shorter & easier:

#include <Array.au3>

Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $sStringWithNewDelimiter = ""
Local $WithFormulasArray = StringSplit ($sString, "{}")
For $i = 1 To $WithFormulasArray [0]
    If mod ($i, 2) <> 0 Then
        $WithFormulasArray [$i] = StringReplace ($WithFormulasArray [$i], ",", @TAB)
    Else
        $WithFormulasArray [$i] = "{" & $WithFormulasArray [$i] & "}"
    EndIf
    $sStringWithNewDelimiter = $sStringWithNewDelimiter & $WithFormulasArray [$i]
Next

Local $FinalArray = StringSplit ($sStringWithNewDelimiter, @TAB)
_ArrayDisplay ($FinalArray)

But that was still a good experience of using RegEx.

Edited by Pumbaa

Share this post


Link to post
Share on other sites

Ufff... I feel not satisfied.

I've imagined smth like

#include  <Array.au3>

Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"

$sString = StringRegExpReplace ($sString, "some tricky RegEx to define all commas that are not situated somewhere between {}", @TAB)

Local $FinalArray = StringSplit ($sString, @TAB)

_ArrayDisplay ($FinalArray)

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Don't overcomplicate things. Providing the input follows a clear set of rules that can be used to extract the required information, then you should be able to parse it. Sometimes a single regular expression will do all (or most of) the job, but a few extra lines of code may be easier to write and understand. You also need to be clear exactly how you want the returned data to be formatted.

Edited by czardas

Share this post


Link to post
Share on other sites

I'm pretty much new to programming and autoIT in general but to answer the topic of the thread, without any super coding as already provided - i usually change the symbol of whatever you're going to use as the delimiter in the cell to something entirely unused - like one of the ASCII latin characters, then run your stringsplit, then re-replace that symbol back to the delimiter symbol.

if cell contains "," swap it to Œ then run your string split. run another replace to turn "Œ" back to ",".

This obviously doesn't work in 100% of all situations you have to do this, but...it's fast and easy to do for us noob programmers. :P

Good luck and have fun.

Share this post


Link to post
Share on other sites

Indeed, don't overcomplicate things. You can do this without regular expressions.

#include <String.au3> ;  _StringBetween
#include <Array.au3> ;  _ArrayDisplay

Local $sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $sBracket, $sTabs, $aFields
While 1
    $sBracket = _StringBetween($sFields, '{', '}')
    If 0 = $sBracket Then ExitLoop
    $sTabs = StringReplace($sBracket[0], ',', @TAB)
    ; Remove the brackets, or the loop will never exit.
    $sFields = StringReplace($sFields, '{' & $sBracket[0] & '}', '_A_' & $sTabs & '_Z_')
WEnd
; Put brackets back in place.
$sFields = StringReplace($sFields, '_A_', '{')
$sFields = StringReplace($sFields, '_Z_', '}')
$aFields = StringSplit($sFields, ',')
_ArrayDisplay($aFields)

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Here my version for this particular string:

#include <Array.au3>
#include <String.au3>

$s = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5"
$aRes = _StringBetween($s, "{", "}")
$aNew = StringSplit(StringReplace(StringReplace(StringReplace($s, $aRes[0], StringReplace($aRes[0], ",", "°^°")), ",", "|"), "°^°", ","), "|", 2)
_ArrayDisplay($aNew)

Or

#include <Array.au3>
#include <String.au3>

$s = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5"
$aRes = _StringBetween($s, "{", "}")
$aNew = StringSplit(StringReplace(StringReplace(StringReplace(StringReplace(StringReplace($s, $aRes[0], StringReplace($aRes[0], ",", "°^°")), ",", "|"), "°^°", ","), "{", "|"), "}", "|"), "|", 2)
_ArrayDisplay($aNew)

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯

Share this post


Link to post
Share on other sites

Thanks guys. These are also possible ways to my goal. I'll put them in my "scripts"-bank.

But still imaginary RegEx decision seems to be more effective due to common count of circles & usages of complex functions like Split, Between & Replace. On large amount of long strings it should be noticeably faster.

May be somewhen RegEx-genius will visit this topic & show us master-class... or explain that it's impossible or will take much more CPU time then any other string operations ;)

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

On large amount of long strings it should be noticeably faster.

Actually not always. It heavily depends on the complexity of the regular expression pattern and your ability to write efficient patterns.

The complexer the pattern the slower the RegExp function will be. RegExp functions can be slower than ordinary String* functions as it scans the string one character at a time, concats that character to the previously scanned characters and tests the entire result against the pattern. Repeat for the next character. With very long strings this will become a slow process.

An unoptimized pattern can have a severe speed impact as well. For instance, using groups ( ... ) extensively will severely slow down any RegExp if they arn't optimized. the pattern

b(integer|insert|in)b
is slower than
b(?:integer|insert|in)b
for the subject 'integers'. They both won't match, but the first RegExp will take more time to figure that out. The reason why is explained here http://www.regular-expressions.info/atomic.html

If you want to treat that test string you gave with only one regular exp<b></b>ression, well, that's going to be a real beasty if it has to take into account all edge cases you've given. Therefore it actually will be slower than my method with String* functions.

For more info and insight on the inner workings of regular exp<b></b>ressions I recommend http://www.regular-expressions.info/

edit: forum software screwed up the links...

edit 2: Well, it's sunday and I got nothing to do, so I had a stab at your test string. To my surprise my RegExp was actually faster than the String* functions. However, the general rule of thumb that String* functions are faster than RegExp* functions still stands. It just depends heavily what you want to do. I once wrote a syntax highlighter in PHP and found str* faster than preg*. Anyway, here's what I got:

#include <String.au3> ;  _StringBetween
#include <Array.au3> ;  _ArrayDisplay

Local $sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $sBracket, $sTabs, $aFields
Local $iStart = TimerInit()
While 1
    $sBracket = _StringBetween($sFields, '{', '}')
    If 0 = $sBracket Then ExitLoop
    $sTabs = StringReplace($sBracket[0], ',', @TAB)
    $sFields = StringReplace($sFields, '{' & $sBracket[0] & '}', '_A_' & $sTabs & '_Z_')
WEnd
$sFields = StringReplace($sFields, '_A_', '{')
$sFields = StringReplace($sFields, '_Z_', '}')
$aFields = StringSplit($sFields, ',')
_ArrayDisplay($aFields, TimerDiff($iStart) / 1000)

$sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5"
Local $rPattern = '([a-z0-9]+{[^}]+}|{[^}]+}|[^{},]+)'
$iStart = TimerInit()
Local $aMatches = StringRegExp($sFields, $rPattern, 3)
_ArrayDisplay($aMatches, TimerDiff($iStart) / 1000)

Also note they produce different results.

Edited by dany

[center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

Splendid work. I've tried to match time results and with my own code, but got different results each time. Seems it depends on some inner processes in Windows.

"text3{Join('Errors','Name','id',Error,'Group',Group)}text4" - text4 is a part of the pattern, but nevertheless that's what I was looking for. Thanks for your collaboration.

Edited by Pumbaa

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0