Jump to content

Can RegEx remove variable number of spaces before a string?


Go to solution Solved by TheXman,

Recommended Posts

Posted

I have a text file that I am reading line by line.  I am trying to replace the blank spaces with a TAB character.  But, I only want to do that after one of the listed Ethnicities is found.  The number of spaces that occurs before the ethnicity is variable.  Can RegEx accomplish this?  Is there another solution if not?  Thanks!

 

;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
    Next
Next

 

 

Posted
  On 6/7/2021 at 9:24 PM, zuladabef said:

Can RegEx accomplish this?

Expand  

Yes

Posted
;~ ;Parse Ethnicity
Local $iUboundOfFile = _FileCountLines($sFilePath_DataBuild)
For $iCC = 1 To $iUboundOfFile -1 
    Local $sFileRead_LineByLine = FileReadLine($hFileOpen, $iCC) ; Read the line of the file using the handle returned by FileOpen.
    ;Find Ethinicity
    Local $aEthnicities = ["BLA ", "WHI ", "AMI ", " OTH ", "CHI ", "MEX "]
    For $xCC = 0 To UBound($aEthnicities) - 1 ;Loop through the ethnicities, one by one.  I may not need this, but not sure.
        ;If "BLA " is found, then replace all the leading spaces (between it and the string before it) with a TAB character.
        Local $iPositionNumberOfEthnicity = StringInStr($sFileRead_LineByLine, $aEthnicities[$xCC], 0, 1)
        If $iPositionNumberOfEthnicity == 0 Then ExitLoop
        $sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(^\s*" & $aEthnicities[$xCC] & ")" , $aEthnicities[$xCC] & @TAB, 1)
        ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $sOutPut_AfterStep1 = ' & $sOutPut_AfterStep1 & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console
    Next
Next

 

I tried adding a few lines, but it is not getting me there.  What do you suggest?

 

 

  • Solution
Posted (edited)
  On 6/7/2021 at 9:52 PM, zuladabef said:

What do you suggest?

Expand  

In the future, especially when dealing with regular expression, I would suggest that you provide accurate test data that has as many different test data scenarios as possible.  I would also suggest that you provide exactly what the expected result should look like using the data that you provide.

Since you provided no data, I used my own.  You can adapt the regular expression as necessary.  As with any language, including regular expressions, there are multiple ways to achieve the same result.  The example below is just one very simple way to accomplish what you described.  It is not case-sensitive and only replaces the defined ethnicities when they appear as whole words (not part of a bigger string of characters).

Const $TEST_DATA = "Date: 2021-06-07   Ethnicity: BLA  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: BLACK  "   & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: WHITE"     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: AMI"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: Oth"       & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: MEX  "     & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: CHI   "    & @CRLF & _
                   "Date: 2021-06-07   Ethnicity: ESK     "  & @CRLF

example()

Func example()

    ConsoleWrite(StringRegExpReplace($TEST_DATA, "(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)

EndFunc

Console Output:

Date: 2021-06-07   Ethnicity:   BLA  
Date: 2021-06-07   Ethnicity: BLACK  
Date: 2021-06-07   Ethnicity:   WHI
Date: 2021-06-07   Ethnicity: WHITE
Date: 2021-06-07   Ethnicity:   AMI
Date: 2021-06-07   Ethnicity:   Oth
Date: 2021-06-07   Ethnicity:   MEX  
Date: 2021-06-07   Ethnicity:   CHI   
Date: 2021-06-07   Ethnicity: ESK

 

Edited by TheXman
Posted

@TheXman Great point about providing test data, I will make sure to include that next time.

I am trying to get a better understanding of the switches you so graciously provided.  Let's see if I understand it correctly:

(?i)\h+\b(BLA|WHI|AMI|OTH|MEX|CHI)\b", @TAB & "\1") & @CRLF)
  • (?i) - this makes it NOT case sensitive?
  • \h+ -  1 or more horizontal spaces?
  • \b match the empty string at the beginning or end of a word.  We need this at the start and end of the string(s) to signify to match the whole word?
  • \1 - I think this means to back reference whichever of the strings was found.  If so, this is super cool.

Do these seem accurate?

Posted (edited)
  On 6/8/2021 at 3:01 PM, zuladabef said:

Do these seem accurate?

Expand  

Yes, for the most part.

  On 6/8/2021 at 3:01 PM, zuladabef said:

\1 - I think this means to back reference whichever of the strings was found.  If so, this is super cool.

Expand  

\1 is a back reference to the first capture group. 
Which in this case is: (BLA|WHI|AMI|OTH|MEX|CHI)

  Reveal hidden contents

 

Edited by TheXman
Posted

@TheXmanOkay that makes sense, thank you!

Now that I understanding this a bit better, I think I should probably rewrite some of my previous code from the same script.  Is it okay to post here, or should I create a new post?  I'll make sure to add some test data this time too.

Posted (edited)

So how would I modify the Regex pattern if I want to look for a nine digit string where the first seven characters are digits and the last two are either "LP" or "UP"?

For example:
1018003LP
1016001UP
1031002UP
1015004LP
 

I was thinking something like this would do the trick, but it's not working.

$sOutPut_AfterStep1 = StringRegExpReplace($sFileRead_LineByLine, "(?i)\h+\b(\d{7}\w{2})\b", @TAB & "\1" & @TAB)

 

*EDIT:

Actually, that does seem to be working now!

Edited by zuladabef
Solved it, I think

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...