How to remove more space in a string and keep space before string ?

Trong · January 4, 2023

;Input    : ....A....B.... .C..D.
;Output ok: ....A.B.C

Local $sLine = "    A    B    C "
$sLine = StringStripWS($Conf_Line, 2)
ConsoleWrite($sLine&@CRLF)

I am stuck on this, if you have a solution, please share it, thanks.

SOLVE-SMART · January 4, 2023

Hi @VIP,

not the elegantest way, but works fine 😅 :

ConsoleWrite(_StripWhitespacesExceptPrefixedOnes("    A    B    C ") & @CRLF)

Func _StripWhitespacesExceptPrefixedOnes($sString)
    Local Const $sRegExPattern    = '(\s+).*?$'
    Local Const $iReturnMatchFlag = 1

    Local Const $sPrefixSpaces = StringRegExp($sString, $sRegExPattern, $iReturnMatchFlag)[0]
    Local Const $iStripLeadingTrailingDoubleFlag = 7

    Return $sPrefixSpaces & StringStripWS($sString, $iStripLeadingTrailingDoubleFlag)
EndFunc

Best regards
Sven

________________
Stay innovative!

Edited January 4, 2023 by SOLVE-SMART

Trong · January 4, 2023

Thank you very much, I made a little tweak to make sure the code will run in all cases:

Func _StripWhitespacesExceptPrefixedOnes($sString)
    Local Const $sRegExPattern    = '(\s+).*?$'
    Local Const $iReturnMatchFlag = 1
    Local Const $aPrefixSpaces = StringRegExp($sString, $sRegExPattern, $iReturnMatchFlag)
    If IsArray($aPrefixSpaces) Then
        Local Const $sPrefixSpaces = $aPrefixSpaces[0]
        Local Const $iStripLeadingTrailingDoubleFlag = 7
        Return $sPrefixSpaces & StringStripWS($sString, $iStripLeadingTrailingDoubleFlag)
    Else
        Return $sString
    EndIf
EndFunc

SOLVE-SMART · January 4, 2023

Hi @VIP,

sure, good point 👍 . I was only focusing on that specific example. Anyways, good improvement 😀 .

Best regards
Sven

________________
Stay innovative!

ioa747 · January 4, 2023

... other way

Local $sLine = "    A    B    C "

$sLine = StringStripWS($sLine, 4)
ConsoleWrite(" like this:" & $sLine& @CRLF)

$sLine = " " & StringStripWS($sLine, 8)
ConsoleWrite("or like this:" & $sLine & @CRLF)

Edited January 4, 2023 by ioa747

SOLVE-SMART · January 4, 2023

Hi @ioa747,

no not really! The output should look like this:

2 hours ago, VIP said:

;Input    : ....A....B.... .C..D.
;Output ok: ....A.B.C

At least that were the requirements of @VIP (the dots "." are spaces). Your example doesn't produce this output 😉 .

Best regards
Sven

________________
Stay innovative!

Edited January 4, 2023 by SOLVE-SMART

pixelsearch · January 4, 2023

Hi everybody
Why not a RegEx pattern with 2 capturing groups ?

Local $sInput = '   AB   C   D   '
Local $sPattern = '(\s*)(.*\S)'
Local $aOutPut = StringRegExp($sInput, $sPattern, 3) ; 3 = array of global matches
Local $sOutPut = $aOutPut[0] & StringStripWS($aOutPut[1], 4) ; 4 = strip double (or more) spaces between words

; ConsoleWrite($sOutPut & @crlf)
ConsoleWrite(">" & $sOutPut & "<" & @crlf) ; line with delimiters during debug phase

Nine · January 4, 2023

Or this single liner ?

Local $sString = StringStripWS(StringRegExpReplace("   A  B    C     D   ", "(\S+)(\s*)", "$1 "), 2)
ConsoleWrite($sString & @CRLF)

mikell · January 5, 2023

A simple SRER does the job

$in = '       AA     B C      D       '
$out = StringRegExpReplace($in, '(?<=\S)(\s+?)(?=\s\S|$)', "")
msgbox(0,"", "=" & $out & "=")

pixelsearch · January 6, 2023

@mikell I grouped in the following script the 3 ways we did it (Nine, you, me) then I'll ask a question at the end of this post

Local $sInput =     '       AA     B C      D       '
; expected result : '       AA B C D'  (e.g. white spaces : keep all leading, remove all trailing, no doubling between words)

;=============================================================================
; Pixelsearch's way :
Local $sPattern = '(\s*)(.*\S)'
Local $aOutPut = StringRegExp($sInput, $sPattern, 3) ; 3 = array of global matches
Local $sOutPut = $aOutPut[0] & StringStripWS($aOutPut[1], 4) ; 4 = strip double (or more) spaces between words
ConsoleWrite("Pixel" & @TAB & "<" & $sOutPut & ">" & @crlf) ; delimiters < > during debug phase

;=============================================================================
; Nine's way :
Local $sOutPut = StringStripWS(StringRegExpReplace($sInput, "(\S+)(\s*)", "$1 "), 2) ; Nine's original (variable names reworked for the script)
ConsoleWrite("Nine" & @TAB & "<" & $sOutPut & ">" & @crlf)

;=============================================================================
; Mikell's way :
Local $sOutPut = StringRegExpReplace($sInput, '(?<=\S)(\s+?)(?=\s\S|$)', "") ; Mikell's original (variable names reworked for the script)
ConsoleWrite("Mikell" & @TAB & "<" & $sOutPut & ">" & @crlf)

;=============================================================================
#cs
Personal notes :
mikell's way is a good way to learn RegEx Assertions, as described in AutoIt help file, topic StringRegExp :
"Assertions : test the character(s) preceding (look-behind), at (word boundary) or following (look-ahead) the current matching point."

Note: the following input (with additional @crlf and spaces) won't give the same result when using Mikell's original pattern (why ?)
Local $sInput = '       AA     B C      D       ' & @crlf   ; & '   ' & @crlf
#ce

Console output is the same for the 3 of us :

Pixel   <       AA B C D>
Nine    <       AA B C D>
Mikell  <       AA B C D>

Your way is interesting because it teaches us Regex “lookaround” expressions, as described in AutoIt helpfile, topic Regex, for example the Positive look-behind and Positive look-ahead found in your pattern :

(?<=\S)(\s+?)(?=\s\S|$)

(?<=X)
Positive look-behind: matches when the subpattern X matches characters preceding the current position. [...]

(?=X)
Positive look-ahead: matches when the subpattern X matches starting at the current position.

I tried your pattern with "RegExpQuickTester.au3", explanations will follow :

138066857_mikellsregexlookaround.png.7d046b966a4b4fae0749b3a34659d774.png

Nice explanation (directly in RegExpQuickTester right click help menu) for the 4 lookaround expressions, it helped me a lot to understand your pattern

Now if I understood correctly, we're searching for :

Whitespace character(s) in the capturing group (\s+?)
... only if they're preceded by a non-whitespace char, because of the Positive look-behind (?<=\S)
... only if they're followed by a whitespace char AND (a non-whitespace char OR if the end of string is reached), because of the Positive look-ahead (?=\s\S|$)

In the precedent pic, the Replace Pattern Tab is green (thanks TCM_HIGHLIGHTITEM !) . This indicates that there is something typed in the edit control found in the Replace Pattern Tab. I typed <$1> in it, so we can see exactly the 3 groups that matched your pattern in the Result field (at the bottom of the pic) and everything seems correct.

A question now : how should your pattern be modified so it gives a correct result with this $sInput code :

Local $sInput = '       AA     B C      D       ' & @crlf   ; & '   ' & @crlf

If we run the preceding script with this input line, then it outputs like this in the console :

Pixel   <       AA B C D>
Nine    <       AA B C D>
Mikell  <       AA B C D
>

Nine's way (and mine) output correctly, so how should your pattern be amended to output correctly too ?
Thanks

Edit: maybe this is the answer to my question. Mikell's original group is scripted like this, with a lazy ? that inverts the greediness (why was the laziness required ?)

(\s+?)

When I try it without the question mark, then the greediness reappears and grabs everything at the end of the input string, including @CRLF and additional whitespaces. Also, the following greedy pattern seems to work fine when applied to the original input line :

(\s+)

Edited January 6, 2023 by pixelsearch
a possible answer to my question

mikell · January 6, 2023

Nice analysis !

15 hours ago, pixelsearch said:

... only if they're followed by a whitespace char AND (a non-whitespace char OR if the end of string is reached), because of the Positive look-ahead (?=\s\S|$)

Hmm I prefer told like this (parenthesis...)
... only if they're followed by (a whitespace char AND a non-whitespace char) OR (if the end of string is reached)

16 hours ago, pixelsearch said:

how should your pattern be modified so it gives a correct result with this $sInput code

Ha. As often I was a bit lazy and didn't take care of a possible trailing newline
So there are 2 answers, keeping the lazy "+"

$out = StringRegExpReplace($in, '(?<=\S)(\s+?)(?=\s\S|\z)', "")

$out = StringRegExpReplace($in, '(*CR)(?<=\S)(\s+?)(?=\s\S|$)', "")

But (\s+) is the best one indeed, I forgot the ability of the regex engine to look backwards - so the \s in the lookahead is not matched in the group

pixelsearch · January 7, 2023

7 hours ago, mikell said:

Hmm I prefer told like this (parenthesis...)
... only if they're followed by (a whitespace char AND a non-whitespace char) OR (if the end of string is reached)

Very true, thanks for this correct explanation which concerns the pipe symbol, because I applied it wrongly to only 1 character at its left. Another example (without spaces !) to confirm that behavior :

Input :
a1b2c3b2de

Pattern :
(b)(?=2c|d)

Replace pattern :
<$1>

Result :
a1<b>2c3b2de

1714686336_Resultfieldwithoutanysuperfluouscrlf.png.3862ff9c7b1b56334342940a0dcf1724.png

Capturing group (b) ... only if b is immediately followed by (2 AND c) OR d

So only 1 b is correctly captured, when 2 b's would have been captured when using my wrong explanation "2 AND (c OR d)" which in fact corresponds to this pattern (b)(?=2(?:c|d))

For the record, I just made a new change in "RegExpQuickTester.au3" (bringing it to version 2.5e) to get rid of any disturbing superfluous @CRLF that could be found at the end of the read-only Result Edit field.

I mean... we always take care of trailing spaces and last @CRLF's in all these edit fields (input string, search pattern, replace pattern, result prefix) so why shouldn't we in the Result field ?

It's not because the original scripter added lines that all end with @CRLF in the Result Edit field (years ago) that it should stay like this forever. As you see in the pic above, the blinking caret (visible on purpose at the exact moment the pic was taken) is placed at the very end of the result string. Typing "key down" won't do anything at this point as there are no superfluous @CRLF anymore in this read-only result field. Keeping an eventual @CRLF confused me sometimes (in RegEx modes 1 to 4) when dealing with patterns that return trailing spaces and/or trailing @CRLF's

This little inconvenience never happens with leading spaces / leading @CRLF's as they're clearly visible before the matched result

Trong · January 10, 2023

For me, everyone is the hero, the teacher. Thanks for always helping me.

Block all input without UAC	Save/Retrieve Images to/from Text	Monitor Management (VCP commands)
Tool to search in text (au3) files	Date Range Picker	Virtual Desktop Manager
Sudoku Game 2020	Overlapped Named Pipe IPC	HotString 2.0 - Hot keys with string
x64 Bitwise Operations	Multi-keyboards HotKeySet	Recursive Array Display
Fast and simple WCD IPC	Multiple Folders Selector	Printer Manager
GIF Animation (cached)	Screen Scraping	Multi-Threading Made Easy

How to remove more space in a string and keep space before string ?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members