Jump to content

Can RegEx remove variable number of spaces before a string?


Go to solution Solved by TheXman,

Recommended Posts

I have a text file that I am reading line by line.  I am trying to replace the blank spaces with a TAB character.  But, I only want to do that after one of the listed Ethnicities is found.  The number of spaces that occurs before the ethnicity is variable.  Can RegEx accomplish this?  Is there another solution if not?  Thanks!

 

;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
    Next
Next

 

 

Link to post
Share on other sites
;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
        Local $iPositionNumberOfEthnicity = StringInStr($sFileRead_LineByLine, $aEthnicities[$xCC], 0, 1)
        If $iPositionNumberOfEthnicity == 0 Then ExitLoop
        $sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(^\s*" & $aEthnicities[$xCC] & ")" , $aEthnicities[$xCC] & @TAB, 1)
        ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sOutPut_AfterStep1 = ' & $sOutPut_AfterStep1 & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
    Next
Next

 

I tried adding a few lines, but it is not getting me there.  What do you suggest?

 

 

Link to post
Share on other sites
  • Solution
Posted (edited)
1 hour ago, zuladabef said:

What do you suggest?

In the future, especially when dealing with regular expression, I would suggest that you provide accurate test data that has as many different test data scenarios as possible.  I would also suggest that you provide exactly what the expected result should look like using the data that you provide.

Since you provided no data, I used my own.  You can adapt the regular expression as necessary.  As with any language, including regular expressions, there are multiple ways to achieve the same result.  The example below is just one very simple way to accomplish what you described.  It is not case-sensitive and only replaces the defined ethnicities when they appear as whole words (not part of a bigger string of characters).

Const $TEST_DATA = "Date: 2021-06-07   Ethnicity: BLA  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: BLACK  "   & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHITE"     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: AMI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: Oth"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: MEX  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: CHI   "    & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: ESK     "  & @CRLF

example()

Func example()

    ConsoleWrite(StringRegExpReplace($TEST_DATA, "(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)

EndFunc

Console Output:

Date: 2021-06-07   Ethnicity:   BLA  
Date: 2021-06-07   Ethnicity: BLACK  
Date: 2021-06-07   Ethnicity:   WHI
Date: 2021-06-07   Ethnicity: WHITE
Date: 2021-06-07   Ethnicity:   AMI
Date: 2021-06-07   Ethnicity:   Oth
Date: 2021-06-07   Ethnicity:   MEX  
Date: 2021-06-07   Ethnicity:   CHI   
Date: 2021-06-07   Ethnicity: ESK

 

Edited by TheXman
Link to post
Share on other sites

@TheXman Great point about providing test data, I will make sure to include that next time.

I am trying to get a better understanding of the switches you so graciously provided.  Let's see if I understand it correctly:

(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)
  • (?i) - this makes it NOT case sensitive?
  • \h+ -  1 or more horizontal spaces?
  • \b match the empty string at the beginning or end of a word.  We need this at the start and end of the string(s) to signify to match the whole word?
  • \1 - I think this means to back reference whichever of the strings was found.  If so, this is super cool.

Do these seem accurate?

Link to post
Share on other sites
Posted (edited)
1 hour ago, zuladabef said:

Do these seem accurate?

Yes, for the most part.

1 hour ago, zuladabef said:

\1 - I think this means to back reference whichever of the strings was found.  If so, this is super cool.

\1 is a back reference to the first capture group. 
Which in this case is: (BLA|WHI|AMI|OTH|MEX|CHI)

Spoiler

(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b

Use these options for the whole regular expression «(?i)»
   Case insensitive «i»
Match a single character that is a “hortizonal whitespace character” (tab or any Unicode space separator) «\h+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»
Match the regex below and capture its match into backreference number 1 «(BLA|WHI|AMI|OTH|MEX|CHI)»
   Match this alternative (attempting the next alternative only if this one fails) «BLA»
      Match the character string “BLA” literally (case insensitive) «BLA»
   Or match this alternative (attempting the next alternative only if this one fails) «WHI»
      Match the character string “WHI” literally (case insensitive) «WHI»
   Or match this alternative (attempting the next alternative only if this one fails) «AMI»
      Match the character string “AMI” literally (case insensitive) «AMI»
   Or match this alternative (attempting the next alternative only if this one fails) «OTH»
      Match the character string “OTH” literally (case insensitive) «OTH»
   Or match this alternative (attempting the next alternative only if this one fails) «MEX»
      Match the character string “MEX” literally (case insensitive) «MEX»
   Or match this alternative (the entire group fails if this one fails to match) «CHI»
      Match the character string “CHI” literally (case insensitive) «CHI»
Assert position at a word boundary (position preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»

Created with RegexBuddy

 

 

Edited by TheXman
Link to post
Share on other sites

@TheXmanOkay that makes sense, thank you!

Now that I understanding this a bit better, I think I should probably rewrite some of my previous code from the same script.  Is it okay to post here, or should I create a new post?  I'll make sure to add some test data this time too.

Link to post
Share on other sites
1 hour ago, zuladabef said:

Is it okay to post here, or should I create a new post?

Either way, don’t forget to mark the topic as solved and give proper credit.

Code hard, but don’t hard code...

Link to post
Share on other sites
Posted (edited)

So how would I modify the Regex pattern if I want to look for a nine digit string where the first seven characters are digits and the last two are either "LP" or "UP"?

For example:
1018003LP
1016001UP
1031002UP
1015004LP
 

I was thinking something like this would do the trick, but it's not working.

$sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(?i)\h+\b(\d{7}\w{2})\b", @TAB & "\1" & @TAB)

 

*EDIT:

Actually, that does seem to be working now!

Edited by zuladabef
Solved it, I think
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...