Can RegEx remove variable number of spaces before a string?

zuladabef · June 7, 2021

I have a text file that I am reading line by line. I am trying to replace the blank spaces with a TAB character. But, I only want to do that after one of the listed Ethnicities is found. The number of spaces that occurs before the ethnicity is variable. Can RegEx accomplish this? Is there another solution if not? Thanks!

;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
    Next
Next

TheXman · June 7, 2021

12 minutes ago, zuladabef said:

Can RegEx accomplish this?

Yes

zuladabef · June 7, 2021

;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
        Local $iPositionNumberOfEthnicity = StringInStr($sFileRead_LineByLine, $aEthnicities[$xCC], 0, 1)
        If $iPositionNumberOfEthnicity == 0 Then ExitLoop
        $sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(^\s*" & $aEthnicities[$xCC] & ")" , $aEthnicities[$xCC] & @TAB, 1)
        ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sOutPut_AfterStep1 = ' & $sOutPut_AfterStep1 & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
    Next
Next

I tried adding a few lines, but it is not getting me there. What do you suggest?

JockoDundee · June 7, 2021

1 hour ago, TheXman said:

Yes

Not even a period!

TheXman · June 7, 2021

1 hour ago, zuladabef said:

What do you suggest?

In the future, especially when dealing with regular expression, I would suggest that you provide accurate test data that has as many different test data scenarios as possible. I would also suggest that you provide exactly what the expected result should look like using the data that you provide.

Since you provided no data, I used my own. You can adapt the regular expression as necessary. As with any language, including regular expressions, there are multiple ways to achieve the same result. The example below is just one very simple way to accomplish what you described. It is not case-sensitive and only replaces the defined ethnicities when they appear as whole words (not part of a bigger string of characters).

Const $TEST_DATA = "Date: 2021-06-07   Ethnicity: BLA  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: BLACK  "   & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHITE"     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: AMI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: Oth"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: MEX  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: CHI   "    & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: ESK     "  & @CRLF

example()

Func example()

    ConsoleWrite(StringRegExpReplace($TEST_DATA, "(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)

EndFunc

Console Output:

Date: 2021-06-07   Ethnicity:   BLA  
Date: 2021-06-07   Ethnicity: BLACK  
Date: 2021-06-07   Ethnicity:   WHI
Date: 2021-06-07   Ethnicity: WHITE
Date: 2021-06-07   Ethnicity:   AMI
Date: 2021-06-07   Ethnicity:   Oth
Date: 2021-06-07   Ethnicity:   MEX  
Date: 2021-06-07   Ethnicity:   CHI   
Date: 2021-06-07   Ethnicity: ESK

Edited June 7, 2021 by TheXman

zuladabef · June 8, 2021

@TheXman Great point about providing test data, I will make sure to include that next time.

I am trying to get a better understanding of the switches you so graciously provided. Let's see if I understand it correctly:

(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)

(?i) - this makes it NOT case sensitive?
\h+ - 1 or more horizontal spaces?
\b match the empty string at the beginning or end of a word. We need this at the start and end of the string(s) to signify to match the whole word?
\1 - I think this means to back reference whichever of the strings was found. If so, this is super cool.

Do these seem accurate?

TheXman · June 8, 2021

1 hour ago, zuladabef said:

Do these seem accurate?

Yes, for the most part.

1 hour ago, zuladabef said:

\1 - I think this means to back reference whichever of the strings was found. If so, this is super cool.

\1 is a back reference to the first capture group.
Which in this case is: (BLA|WHI|AMI|OTH|MEX|CHI)

Spoiler

(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b

Use these options for the whole regular expression «(?i)»
   Case insensitive «i»
Match a single character that is a “hortizonal whitespace character” (tab or any Unicode space separator) «\h+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»
Match the regex below and capture its match into backreference number 1 «(BLA|WHI|AMI|OTH|MEX|CHI)»
   Match this alternative (attempting the next alternative only if this one fails) «BLA»
      Match the character string “BLA” literally (case insensitive) «BLA»
   Or match this alternative (attempting the next alternative only if this one fails) «WHI»
      Match the character string “WHI” literally (case insensitive) «WHI»
   Or match this alternative (attempting the next alternative only if this one fails) «AMI»
      Match the character string “AMI” literally (case insensitive) «AMI»
   Or match this alternative (attempting the next alternative only if this one fails) «OTH»
      Match the character string “OTH” literally (case insensitive) «OTH»
   Or match this alternative (attempting the next alternative only if this one fails) «MEX»
      Match the character string “MEX” literally (case insensitive) «MEX»
   Or match this alternative (the entire group fails if this one fails to match) «CHI»
      Match the character string “CHI” literally (case insensitive) «CHI»
Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»

Created with RegexBuddy

Edited June 8, 2021 by TheXman

zuladabef · June 8, 2021

@TheXmanOkay that makes sense, thank you!

Now that I understanding this a bit better, I think I should probably rewrite some of my previous code from the same script. Is it okay to post here, or should I create a new post? I'll make sure to add some test data this time too.

JockoDundee · June 8, 2021

1 hour ago, zuladabef said:

Is it okay to post here, or should I create a new post?

Either way, don’t forget to mark the topic as solved and give proper credit.

zuladabef · June 8, 2021

So how would I modify the Regex pattern if I want to look for a nine digit string where the first seven characters are digits and the last two are either "LP" or "UP"?

For example:
1018003LP
1016001UP
1031002UP
1015004LP

I was thinking something like this would do the trick, but it's not working.

$sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(?i)\h+\b(\d{7}\w{2})\b", @TAB & "\1" & @TAB)

*EDIT:

Actually, that does seem to be working now!

Edited June 8, 2021 by zuladabef
Solved it, I think

Sign In

Can RegEx remove variable number of spaces before a string?

Recommended Posts

zuladabef

TheXman

zuladabef

JockoDundee

TheXman

zuladabef

TheXman

zuladabef

JockoDundee

zuladabef

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta