czardas

StringRegExpReplace Challenge

4 posts in this topic

#1 ·  Posted (edited)

Well it's certainly a challenge to me. There are four calls to StringRegExpReplace() in the following script and I would like to combine the patterns, or use better patterns, to reduce the number of calls. Even combining two of the regular expressions would be an improvement. The rules are very simple. Given a decimal string, strip out non-essential zeros (or add them) in order to test equality with itself after execution: providing strong evidence that numeric conversion is either accurate, inaccurate or sometimes inappropriate. The string may begin with a minus sign and a decimal point may appear anywhere thereafter. No other symbols appear in the input. The string can be any length.

I have added the array output at the end of the code. The code requires ArrayWorkshop (in my sig). I think it makes it easier to understand written this way. I have added comments to explain what I expect each regular expression to do. The digit 1 in the samples could be any non-zero digit 1-9.
 

; THREE RULES [... '\A\-?(\d*\.?\d+|\d+\.)\z' ...matches input]
; 1. The input string (digits) must contain at least one non-zero value. [1]
; 2. A single period may appear anywhere in the string. [.1]
; 3. The string can be negative. [-.1]

; The output string should be modified to become equal to itself after execution and then tested.
; No alpha characters are allowed. [!1.0e+19] ==> that's a different module

#include <Array.au3>
#include 'ArrayWorkshop.au3'

; column headers
Local $aTest = ['Sample', 'Strip leading zeros?', 'Prefix zero?', 'Strip trailing zeros?', 'Strip trailing period?', "Trust it?"]

; data to modify
Local $aSample =  _
    ['000.01000', _
    '-001', _
    '1.0', _
    '100', _
    '001.100', _
    '-001.100', _
    '1.', _
    '0001000', _
    '-0001000', _
    '00.001', _
    '.001', _
    '.000000001', _ ; edge case
    '11111111111111111111', _ ; out of bounds
    '-01.']

_PreDim($aTest, 2, True) ; set column headers
_ArrayAttach($aTest, $aSample) ; add the sample data

For $i = 1 To UBound($aTest) -1 ; here you can see the effect of each regular expression
    $aTest[$i][1] = StringRegExpReplace($aTest[$i][0], '(\A\-?)(0+)(.*\z)', '\1\3') ; strip leading zeros 000xx|-000xx ==> xx|-xx
    $aTest[$i][2] = StringRegExpReplace($aTest[$i][1], '(\A\-?)(\..*\z)', '${1}0\2') ; prefix zero [or not]? .xxx|-.xxx ==> 0.xxx|-0.xxx
    $aTest[$i][3] = StringRegExpReplace($aTest[$i][2], '(\A.+\.)(.*[^0])?(0+\z)', '\1\2') ; strip trailing zeros? x.xx000|-x.xx000 ==> x.xx|-x.xx
    $aTest[$i][4] = StringRegExpReplace($aTest[$i][3], '(\A.+)(\.\z)', '\1') ; strip trailing period? xxx. ==> xxx
    $aTest[$i][5] = (StringCompare(Execute($aTest[$i][4]), $aTest[$i][4]) = 0)  ; What does the interpreter make of it? / Do you trust the conversion?
Next

_ArrayDisplay($aTest)

#cs - RESULTS
Sample - Strip leading zeros? - Prefix zero? - Strip trailing zeros? - Strip trailing period? - Trust it?

000.01000 |.01000    |0.01000    |0.01       |0.01       |True
-001      |-1        |-1         |-1         |-1         |True
1.0       |1.0       |1.0        |1.         |1          |True
100       |100       |100        |100        |100        |True
001.100   |1.100     |1.100      |1.1        |1.1        |True
-001.100  |-1.100    |-1.100     |-1.1       |-1.1       |True
1.        |1.        |1.         |1.         |1          |True
0001000   |1000      |1000       |1000       |1000       |True
-0001000  |-1000     |-1000      |-1000      |-1000      |True
00.001    |.001      |0.001      |0.001      |0.001      |True
.001      |.001      |0.001      |0.001      |0.001      |True
.000000001|.000000001|0.000000001|0.000000001|0.000000001|False
11111111111111111111|11111111111111111111|11111111111111111111|11111111111111111111|11111111111111111111|False
-01.      |-1.       |-1.        |-1.        |-1         |True
#ce

 

Edited by czardas

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I'm not sure to understand the result you wait for...

#include <Array.au3>
#include 'ArrayWorkshop.au3'

; column headers
Local $aTest = ['Sample', "Adding zeros", "Strip un-necessary zeros", "Trust it?"]

; data to modify
Local $aSample =  _
    ['000.01000', _
    '-001', _
    '1.0', _
    '100', _
    '001.100', _
    '-001.100', _
    '1.', _
    '0001000', _
    '-0001000', _
    '00.001', _
    '.001', _
    '.000000001', _ ; edge case
    '11111111111111111111', _ ; out of bounds
    '-01.']

_PreDim($aTest, 2, True) ; set column headers
_ArrayAttach($aTest, $aSample) ; add the sample data

For $i = 1 To UBound($aTest) -1 ; here you can see the effect of each regular expression
$aTest[$i][1] = StringRegExpReplace($aTest[$i][0], "^-?\K(?=\.)", "0")
    $aTest[$i][2] = StringRegExpReplace($aTest[$i][1], "^-?\K0+(?=[1-9]|0\.?)|\.0*$|\.\d*[1-9]\K0+", "")
    $aTest[$i][3] = (StringCompare(Execute($aTest[$i][2]), $aTest[$i][2]) = 0)
Next

_ArrayDisplay($aTest)

 

Edited by jguinch
1 person likes this

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thanks jguinch, that's a big help. :) I think you've pretty much understood the idea and included some things I didn't think of, or haven't used before. The string will have already been checked to make sure it only contains digits and the two other symbols (minus sign and decimal point).

I'm writing a numeric sort algorithm. The fewer strings there are, the faster processing goes, because numbers can be compared against one another very quickly. I want to be able to sort googol sized integers if need be - so just using Number() is out of the question, and the method used will depend on which data types are being compared and their magnitude. The preprocessing above is relevant for all recognized strings. The ones that cannot easily be converted to numbers will be processed more slowly as strings: comparing them in different ways. I hope that explains why I want this.

Edit: I'll be testing numbers against numbers, strings against strings and numbers against strings. Strings can be integers or floats, and numbers can be of any type. The method for each comparison is already worked out and requires strings first to be formatted as above.

Edited by czardas

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now