Pumbaa Posted September 8, 2012 Share Posted September 8, 2012 Hi All! I need to split into parts string like below with comma delimiter: text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5 As you see 3rd field includes text selflimited with { and } signs. Such constructions can appear in any field & not just once per string or even field, it also can be all the content of a field. Using simple StringSplit gives me wrong result. I've tried to make StringSplit first based on { & } signs and then after analysing result StringSplit to parts, which are not between {}, but that's rather messy method. I was thinking of replacing "right" commas with, for example, @ to use regular StringSplit after that. May be RexExp functions could be useful here, but I'm not too familiar with them to solve my problem. Any suggestions? Link to comment Share on other sites More sharing options...
czardas Posted September 8, 2012 Share Posted September 8, 2012 Recommendations.1. I recommend you read the specs for csv format.2. use one of the csv scripts in example scripts. Search the forum for 3. don't use commas within fields or use Chr(130) instead4.Use a different delimeter such as semicolon or TAB5. you may be able to create a regular expression if you can identify a pattern. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Malkey Posted September 8, 2012 Share Posted September 8, 2012 Hope this helps. #include <Array.au3> Local $sString = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" ; Create an array of each instance of the text between "{" and "}". Local $aArray = StringRegExp($sString, "\{(.*)\}", 3) ; Replace all occurrances of "{..text in between..}" with a coma. Local $sNewString = StringRegExpReplace($sString, "(\{.*\})", ",") If StringRight($sNewString, 1) = "," Then $sNewString = StringTrimRight($sNewString, 1) ; Delete trailing coma if one exists. Local $aArray2 = StringSplit($sNewString, ",", 2) Local $iUbndA2 = UBound($aArray2) ;Join arrays ReDim $aArray2[$iUbndA2 + UBound($aArray)] For $i = $iUbndA2 To UBound($aArray2) - 1 $aArray2[$i] = $aArray[$i - $iUbndA2] Next _ArrayDisplay($aArray2) Link to comment Share on other sites More sharing options...
Pumbaa Posted September 8, 2012 Author Share Posted September 8, 2012 To czrdas: Unfortunatly I work with externally predicted files & string structures in them, so leading them to csv format & similar recommendations are out of my reach. I've wrote about possible patterns & willingness of RegEx usage, but lack of their understanding & expirience. Thanks anyway. CSV UDF could be useful in future.To Malkey: Thanks, I'll try to develop your offer.Examples & suggestions are still acceptable. Link to comment Share on other sites More sharing options...
czardas Posted September 8, 2012 Share Posted September 8, 2012 (edited) Perhaps this will help or give you some ideas.#include <Array.au3> Local $sString = "text0,{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" ; Create an array of each instance of the text between "{" and "}". Local $sReplacement = ",", $sTemp = $sString ; In case you need the original string later. Local $aArray = StringRegExp($sString, "{[^}]*", 3) ;_ArrayDisplay($aArray) If IsArray($aArray) Then ; Added error check! For $i = 255 To 1 Step -1 ; Search for a replacement character. If Not StringInStr($sString,Chr($i)) Then ExitLoop Next If $i = 0 Then Exit ; In the most unlikely event that no suitable delimeter found. Local $sReplacement = Chr($i) For $i = 0 To UBound($aArray) -1 ; Replace the commas we wish to ignore $sTemp = StringReplace($sTemp, $aArray[$i], StringReplace($aArray[$i], ",", $sReplacement)) Next EndIf $aArray = StringSplit($sTemp, ",", 2) ; Might as well use the same array name again. For $i = 0 To UBound($aArray) -1 ; Put the removed commas back. $aArray[$i] = StringReplace($aArray[$i], $sReplacement, ",") Next _ArrayDisplay($aArray)EditAdded an error check to the code. Edited September 8, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Pumbaa Posted September 8, 2012 Author Share Posted September 8, 2012 (edited) Good. Using RegEx with "{[^}]*" finally gives me smth like this: #include <Array.au3> Local $sString = "text0,{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $ExprArray = StringRegExp ($sString, "{[^}]*", 3) ; array of all {formulas} in string Local $j = @error $sString = StringRegExpReplace ($sString, "{[^}]*", "{") ; replacment of all {formula} in string with "{" Local $FinalArray = StringSplit ($sString, ",") If $j = 0 Then ; if {formula} existed then recover them For $i = 0 To $FinalArray [0] If StringInStr ($FinalArray [$i], "}") > 0 Then $FinalArray [$i] = StringReplace ($FinalArray [$i], "{}", $ExprArray [$j] & "}", 1) $j = $j + 1 EndIf Next EndIf _ArrayDisplay ($FinalArray) It seems to work with all possible variants. The only thing which remains to understand myself is what "{[^}]*" really means. RegEx rules Thanks. Edited September 8, 2012 by Pumbaa Link to comment Share on other sites More sharing options...
czardas Posted September 8, 2012 Share Posted September 8, 2012 (edited) Ha I introduced a bug when I made changes to the above code. It should be okay now. The only thing which remains to understand myself is what "{[^}]*" really means. This is easy to pick apart. { = Find pattern starting with an opening curly bracket (followed by) [^ = a character which is not } = a closing curly bracket ]* = which may or may not appear and may also repeat Edited September 8, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Pumbaa Posted September 8, 2012 Author Share Posted September 8, 2012 (edited) Yes, I've got it. Tried some more combinations, but yours is the most useful. Thanks again. Upgraded my script to take in consideration few appearances of {formula} in one field. #include <Array.au3> Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $ExprArray = StringRegExp ($sString, "{[^}]*", 3) ; array of all {formulas} in string Local $j = @error $sString = StringRegExpReplace ($sString, "{[^}]*", "{") ; replacement of all {formula} in string "{" Local $FinalArray = StringSplit ($sString, ",") Local $NumOfExpr [1], $k If $j = 0 Then ; if {formula} existed, then recover them For $i = 1 To $FinalArray [0] If StringInStr ($FinalArray [$i], "}") > 0 Then $NumOfExpr = StringSplit ($FinalArray [$i], "{}", 1) ; if more then 1 {formula} in field If $NumOfExpr [0] > 1 Then For $k = 1 To $NumOfExpr [0] - 1 $FinalArray [$i] = StringReplace ($FinalArray [$i], "{}", $ExprArray [$j] & "}", 1) $j = $j + 1 Next EndIf EndIf Next EndIf _ArrayDisplay ($FinalArray) Edited September 8, 2012 by Pumbaa Link to comment Share on other sites More sharing options...
czardas Posted September 8, 2012 Share Posted September 8, 2012 I'm glad you found it useful. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Pumbaa Posted September 8, 2012 Author Share Posted September 8, 2012 (edited) Well... after taking care of all damn {formula} possible appearances my old code seems to be shorter & easier: #include <Array.au3> Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $sStringWithNewDelimiter = "" Local $WithFormulasArray = StringSplit ($sString, "{}") For $i = 1 To $WithFormulasArray [0] If mod ($i, 2) <> 0 Then $WithFormulasArray [$i] = StringReplace ($WithFormulasArray [$i], ",", @TAB) Else $WithFormulasArray [$i] = "{" & $WithFormulasArray [$i] & "}" EndIf $sStringWithNewDelimiter = $sStringWithNewDelimiter & $WithFormulasArray [$i] Next Local $FinalArray = StringSplit ($sStringWithNewDelimiter, @TAB) _ArrayDisplay ($FinalArray) But that was still a good experience of using RegEx. Edited September 8, 2012 by Pumbaa Link to comment Share on other sites More sharing options...
Pumbaa Posted September 8, 2012 Author Share Posted September 8, 2012 Ufff... I feel not satisfied. I've imagined smth like #include <Array.au3> Local $sString = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" $sString = StringRegExpReplace ($sString, "some tricky RegEx to define all commas that are not situated somewhere between {}", @TAB) Local $FinalArray = StringSplit ($sString, @TAB) _ArrayDisplay ($FinalArray) Link to comment Share on other sites More sharing options...
czardas Posted September 8, 2012 Share Posted September 8, 2012 (edited) Don't overcomplicate things. Providing the input follows a clear set of rules that can be used to extract the required information, then you should be able to parse it. Sometimes a single regular expression will do all (or most of) the job, but a few extra lines of code may be easier to write and understand. You also need to be clear exactly how you want the returned data to be formatted. Edited September 8, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
xeroTechnologiesLLC Posted September 8, 2012 Share Posted September 8, 2012 I'm pretty much new to programming and autoIT in general but to answer the topic of the thread, without any super coding as already provided - i usually change the symbol of whatever you're going to use as the delimiter in the cell to something entirely unused - like one of the ASCII latin characters, then run your stringsplit, then re-replace that symbol back to the delimiter symbol. if cell contains "," swap it to Œ then run your string split. run another replace to turn "Œ" back to ",". This obviously doesn't work in 100% of all situations you have to do this, but...it's fast and easy to do for us noob programmers. Good luck and have fun. Link to comment Share on other sites More sharing options...
dany Posted September 8, 2012 Share Posted September 8, 2012 Indeed, don't overcomplicate things. You can do this without regular expressions.#include <String.au3> ; _StringBetween #include <Array.au3> ; _ArrayDisplay Local $sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $sBracket, $sTabs, $aFields While 1 $sBracket = _StringBetween($sFields, '{', '}') If 0 = $sBracket Then ExitLoop $sTabs = StringReplace($sBracket[0], ',', @TAB) ; Remove the brackets, or the loop will never exit. $sFields = StringReplace($sFields, '{' & $sBracket[0] & '}', '_A_' & $sTabs & '_Z_') WEnd ; Put brackets back in place. $sFields = StringReplace($sFields, '_A_', '{') $sFields = StringReplace($sFields, '_Z_', '}') $aFields = StringSplit($sFields, ',') _ArrayDisplay($aFields) [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
UEZ Posted September 8, 2012 Share Posted September 8, 2012 (edited) Here my version for this particular string: #include <Array.au3> #include <String.au3> $s = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5" $aRes = _StringBetween($s, "{", "}") $aNew = StringSplit(StringReplace(StringReplace(StringReplace($s, $aRes[0], StringReplace($aRes[0], ",", "°^°")), ",", "|"), "°^°", ","), "|", 2) _ArrayDisplay($aNew) Or #include <Array.au3> #include <String.au3> $s = "text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4,text5" $aRes = _StringBetween($s, "{", "}") $aNew = StringSplit(StringReplace(StringReplace(StringReplace(StringReplace(StringReplace($s, $aRes[0], StringReplace($aRes[0], ",", "°^°")), ",", "|"), "°^°", ","), "{", "|"), "}", "|"), "|", 2) _ArrayDisplay($aNew) Br, UEZ Edited September 8, 2012 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
Pumbaa Posted September 9, 2012 Author Share Posted September 9, 2012 Thanks guys. These are also possible ways to my goal. I'll put them in my "scripts"-bank. But still imaginary RegEx decision seems to be more effective due to common count of circles & usages of complex functions like Split, Between & Replace. On large amount of long strings it should be noticeably faster. May be somewhen RegEx-genius will visit this topic & show us master-class... or explain that it's impossible or will take much more CPU time then any other string operations Link to comment Share on other sites More sharing options...
dany Posted September 9, 2012 Share Posted September 9, 2012 (edited) On large amount of long strings it should be noticeably faster. Actually not always. It heavily depends on the complexity of the regular expression pattern and your ability to write efficient patterns. The complexer the pattern the slower the RegExp function will be. RegExp functions can be slower than ordinary String* functions as it scans the string one character at a time, concats that character to the previously scanned characters and tests the entire result against the pattern. Repeat for the next character. With very long strings this will become a slow process. An unoptimized pattern can have a severe speed impact as well. For instance, using groups ( ... ) extensively will severely slow down any RegExp if they arn't optimized. the pattern b(integer|insert|in)b is slower than b(?:integer|insert|in)b for the subject 'integers'. They both won't match, but the first RegExp will take more time to figure that out. The reason why is explained here http://www.regular-expressions.info/atomic.html If you want to treat that test string you gave with only one regular exp<b></b>ression, well, that's going to be a real beasty if it has to take into account all edge cases you've given. Therefore it actually will be slower than my method with String* functions. For more info and insight on the inner workings of regular exp<b></b>ressions I recommend http://www.regular-expressions.info/ edit: forum software screwed up the links... edit 2: Well, it's sunday and I got nothing to do, so I had a stab at your test string. To my surprise my RegExp was actually faster than the String* functions. However, the general rule of thumb that String* functions are faster than RegExp* functions still stands. It just depends heavily what you want to do. I once wrote a syntax highlighter in PHP and found str* faster than preg*. Anyway, here's what I got: #include <String.au3> ; _StringBetween #include <Array.au3> ; _ArrayDisplay Local $sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $sBracket, $sTabs, $aFields Local $iStart = TimerInit() While 1 $sBracket = _StringBetween($sFields, '{', '}') If 0 = $sBracket Then ExitLoop $sTabs = StringReplace($sBracket[0], ',', @TAB) $sFields = StringReplace($sFields, '{' & $sBracket[0] & '}', '_A_' & $sTabs & '_Z_') WEnd $sFields = StringReplace($sFields, '_A_', '{') $sFields = StringReplace($sFields, '_Z_', '}') $aFields = StringSplit($sFields, ',') _ArrayDisplay($aFields, TimerDiff($iStart) / 1000) $sFields = "text0,{m,o,r,e}{m,o,r,e},text1,text2,text3{Join('Errors','Name','id',Error,'Group',Group)}text4, text5" Local $rPattern = '([a-z0-9]+{[^}]+}|{[^}]+}|[^{},]+)' $iStart = TimerInit() Local $aMatches = StringRegExp($sFields, $rPattern, 3) _ArrayDisplay($aMatches, TimerDiff($iStart) / 1000) Also note they produce different results. Edited September 9, 2012 by dany [center]Spiderskank Spiderskank[/center]GetOpt Parse command line options UDF | AU3Text Program internationalization UDF | Identicon visual hash UDF Link to comment Share on other sites More sharing options...
Pumbaa Posted September 10, 2012 Author Share Posted September 10, 2012 (edited) Splendid work. I've tried to match time results and with my own code, but got different results each time. Seems it depends on some inner processes in Windows. "text3{Join('Errors','Name','id',Error,'Group',Group)}text4" - text4 is a part of the pattern, but nevertheless that's what I was looking for. Thanks for your collaboration. Edited September 10, 2012 by Pumbaa Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now