Jump to content

How to remove more space in a string and keep space before string ?


Go to solution Solved by SOLVE-SMART,

Recommended Posts

  • Solution
Posted (edited)

Hi @VIP,

not the elegantest way, but works fine 😅 :

ConsoleWrite(_StripWhitespacesExceptPrefixedOnes("    A    B    C ") & @CRLF)

Func _StripWhitespacesExceptPrefixedOnes($sString)
    Local Const $sRegExPattern    = '(\s+).*?$'
    Local Const $iReturnMatchFlag = 1

    Local Const $sPrefixSpaces = StringRegExp($sString, $sRegExPattern, $iReturnMatchFlag)[0]
    Local Const $iStripLeadingTrailingDoubleFlag = 7

    Return $sPrefixSpaces & StringStripWS($sString, $iStripLeadingTrailingDoubleFlag)
EndFunc

Best regards
Sven

________________
Stay innovative!

Edited by SOLVE-SMART

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

Thank you very much, I made a little tweak to make sure the code will run in all cases:

Func _StripWhitespacesExceptPrefixedOnes($sString)
    Local Const $sRegExPattern    = '(\s+).*?$'
    Local Const $iReturnMatchFlag = 1
    Local Const $aPrefixSpaces = StringRegExp($sString, $sRegExPattern, $iReturnMatchFlag)
    If IsArray($aPrefixSpaces) Then
        Local Const $sPrefixSpaces = $aPrefixSpaces[0]
        Local Const $iStripLeadingTrailingDoubleFlag = 7
        Return $sPrefixSpaces & StringStripWS($sString, $iStripLeadingTrailingDoubleFlag)
    Else
        Return $sString
    EndIf
EndFunc

 

Regards,
 

Link to post
Share on other sites

Hi @VIP,

sure, good point 👍 . I was only focusing on that specific example. Anyways, good improvement 😀 .

Best regards
Sven

________________
Stay innovative!

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

Hi @ioa747,

no not really! The output should look like this:

2 hours ago, VIP said:
;Input    : ....A....B.... .C..D.
;Output ok: ....A.B.C

 

At least that were the requirements of @VIP (the dots "." are spaces). Your example doesn't produce this output 😉 .

Best regards
Sven

________________
Stay innovative!

Edited by SOLVE-SMART

Stay innovative!

Spoiler

🌍 Au3Forums

📊 AutoIt limits/defaults

💎 Code Katas: [...] (comming soon)

🎭 Collection of GitHub users with AutoIt projects

🐞 False-Positives

🔍 Forum search

🔮 Me on GitHub

💬 Opinion about new forum sub category

 📑 UDF wiki list

✂ VSCode-AutoItSnippets

📑 WebDriver FAQs

👨‍🏫 WebDriver Tutorial (coming soon)

Link to post
Share on other sites

Hi everybody :)
Why not a RegEx pattern with 2 capturing groups ?

Local $sInput = '   AB   C   D   '
Local $sPattern = '(\s*)(.*\S)'
Local $aOutPut = StringRegExp($sInput, $sPattern, 3) ; 3 = array of global matches
Local $sOutPut = $aOutPut[0] & StringStripWS($aOutPut[1], 4) ; 4 = strip double (or more) spaces between words

; ConsoleWrite($sOutPut & @crlf)
ConsoleWrite(">" & $sOutPut & "<" & @crlf) ; line with delimiters during debug phase

 

Link to post
Share on other sites

@mikell I grouped in the following script the 3 ways we did it (Nine, you, me) then I'll ask a question at the end of this post

Local $sInput =     '       AA     B C      D       '
; expected result : '       AA B C D'  (e.g. white spaces : keep all leading, remove all trailing, no doubling between words)

;=============================================================================
; Pixelsearch's way :
Local $sPattern = '(\s*)(.*\S)'
Local $aOutPut = StringRegExp($sInput, $sPattern, 3) ; 3 = array of global matches
Local $sOutPut = $aOutPut[0] & StringStripWS($aOutPut[1], 4) ; 4 = strip double (or more) spaces between words
ConsoleWrite("Pixel" & @TAB & "<" & $sOutPut & ">" & @crlf) ; delimiters < > during debug phase

;=============================================================================
; Nine's way :
Local $sOutPut = StringStripWS(StringRegExpReplace($sInput, "(\S+)(\s*)", "$1 "), 2) ; Nine's original (variable names reworked for the script)
ConsoleWrite("Nine" & @TAB & "<" & $sOutPut & ">" & @crlf)

;=============================================================================
; Mikell's way :
Local $sOutPut = StringRegExpReplace($sInput, '(?<=\S)(\s+?)(?=\s\S|$)', "") ; Mikell's original (variable names reworked for the script)
ConsoleWrite("Mikell" & @TAB & "<" & $sOutPut & ">" & @crlf)

;=============================================================================
#cs
Personal notes :
mikell's way is a good way to learn RegEx Assertions, as described in AutoIt help file, topic StringRegExp :
"Assertions : test the character(s) preceding (look-behind), at (word boundary) or following (look-ahead) the current matching point."

Note: the following input (with additional @crlf and spaces) won't give the same result when using Mikell's original pattern (why ?)
Local $sInput = '       AA     B C      D       ' & @crlf   ; & '   ' & @crlf
#ce

Console output is the same for the 3 of us :

Pixel   <       AA B C D>
Nine    <       AA B C D>
Mikell  <       AA B C D>

Your way is interesting because it teaches us Regex “lookaround” expressions, as described in AutoIt helpfile, topic Regex, for example the Positive look-behind and Positive look-ahead found in your pattern :

(?<=\S)(\s+?)(?=\s\S|$)

(?<=X)
Positive look-behind: matches when the subpattern X matches characters preceding the current position. [...]

(?=X)
Positive look-ahead: matches when the subpattern X matches starting at the current position.

I tried your pattern with "RegExpQuickTester.au3", explanations will follow :

138066857_mikellsregexlookaround.png.7d046b966a4b4fae0749b3a34659d774.png

Nice explanation (directly in RegExpQuickTester right click help menu) for the 4 lookaround expressions, it helped me a lot to understand your pattern :D

Now if I understood correctly, we're searching for :

Whitespace character(s) in the capturing group (\s+?)
... only if they're preceded by a non-whitespace char, because of the Positive look-behind (?<=\S)
... only if they're followed by a whitespace char AND (a non-whitespace char OR if the end of string is reached), because of the Positive look-ahead (?=\s\S|$)

In the precedent pic, the Replace Pattern Tab is green (thanks TCM_HIGHLIGHTITEM !) . This indicates that there is something typed in the edit control found in the Replace Pattern Tab. I typed <$1> in it, so we can see exactly the 3 groups that matched your pattern in the Result field (at the bottom of the pic) and everything seems correct.

A question now : how should your pattern be modified so it gives a correct result with this $sInput code :

Local $sInput = '       AA     B C      D       ' & @crlf   ; & '   ' & @crlf

If we run the preceding script with this input line, then it outputs like this in the console :

Pixel   <       AA B C D>
Nine    <       AA B C D>
Mikell  <       AA B C D
>

Nine's way (and mine) output correctly, so how should your pattern be amended to output correctly too ?
Thanks :)

Edit: maybe this is the answer to my question. Mikell's original group is scripted like this, with a lazy ? that inverts the greediness (why was the laziness required ?)

(\s+?)

When I try it without the question mark, then the greediness reappears and grabs everything at the end of the input string, including @CRLF and additional whitespaces. Also, the following greedy pattern seems to work fine when applied to the original input line :

(\s+)

 

Edited by pixelsearch
a possible answer to my question
Link to post
Share on other sites

Nice analysis ! ;)
 

15 hours ago, pixelsearch said:

... only if they're followed by a whitespace char AND (a non-whitespace char OR if the end of string is reached), because of the Positive look-ahead (?=\s\S|$)

Hmm I prefer told like this (parenthesis...)
... only if they're followed by (a whitespace char AND a non-whitespace char) OR (if the end of string is reached)

16 hours ago, pixelsearch said:

how should your pattern be modified so it gives a correct result with this $sInput code

Ha. As often I was a bit lazy and didn't take care of a possible trailing newline :)
So there are 2 answers, keeping the lazy "+"

$out = StringRegExpReplace($in, '(?<=\S)(\s+?)(?=\s\S|\z)', "")

$out = StringRegExpReplace($in, '(*CR)(?<=\S)(\s+?)(?=\s\S|$)', "")

But  (\s+)  is the best one indeed, I forgot the ability of the regex engine to look backwards - so the \s in the lookahead is not matched in the group

 

Link to post
Share on other sites
7 hours ago, mikell said:

Hmm I prefer told like this (parenthesis...)
... only if they're followed by (a whitespace char AND a non-whitespace char) OR (if the end of string is reached)

Very true, thanks for this correct explanation which concerns the pipe symbol, because I applied it wrongly to only 1 character at its left. Another example (without spaces !) to confirm that behavior :

Input :
a1b2c3b2de

Pattern :
(b)(?=2c|d)

Replace pattern :
<$1>

Result :
a1<b>2c3b2de

1714686336_Resultfieldwithoutanysuperfluouscrlf.png.3862ff9c7b1b56334342940a0dcf1724.png

Capturing group (b) ... only if b is immediately followed by (2 AND c) OR d

So only 1 b is correctly captured, when 2 b's would have been captured when using my wrong explanation "2 AND (c OR d)" which in fact corresponds to this pattern (b)(?=2(?:c|d))

For the record, I just made a new change in "RegExpQuickTester.au3" (bringing it to version 2.5e) to get rid of any disturbing superfluous @CRLF that could be found at the end of the read-only Result Edit field.

I mean... we always take care of trailing spaces and last @CRLF's in all these edit fields (input string, search pattern, replace pattern, result prefix) so why shouldn't we in the Result field ?

It's not because the original scripter added lines that all end with @CRLF in the Result Edit field (years ago) that it should stay like this forever. As you see in the pic above, the blinking caret (visible on purpose at the exact moment the pic was taken) is placed at the very end of the result string. Typing "key down" won't do anything at this point as there are no superfluous @CRLF anymore in this read-only result field. Keeping an eventual @CRLF confused me sometimes (in RegEx modes 1 to 4) when dealing with patterns that return trailing spaces and/or trailing @CRLF's

This little inconvenience never happens with leading spaces / leading @CRLF's as they're clearly visible before the matched result :D

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...