Sign in to follow this  
Followers 0
jwseek

finding numbers or words in strings...

8 posts in this topic

So I couldn't find any scripts like this in the forum, so I figured posting it would be useful...

So the following two functions are - hopefully - well commented, but they take a string and, in one case, pull all the numbers out into a " "-delineated string (for easy parsing). In the other case, we pull all the "words" (strings of concurrent letters) from the input string into a " "-delineated string.

I'm not sure exactly how effective these functions are, but they haven't failed me in my uses yet, so...

StringStripAlpha: Creates " " split string with numbers from input string...

#cs ---------------------
    pull one character at a time from the string...
    if it's a digit [0-9], start adding every character from that digit on
    until the character to add is not a digit.  Then add a space at the end.
    This essentially makes the following string:
    error action="1" item="C:\Documents and Settings\LocalService\Local Settings\Application Data\Microsoft\Windows\UsrClass.dat" code="32" time="2009-07-21T21:22:17" />
    into:
    1 32 2009 07 21 21 22 17
#ce ---------------------
Func stringStripAlpha($alphNumString)
    $tempString = ""
    $numberString = ""
    For $i = 1 To StringLen($alphNumString)
        $tempString = StringMid($alphNumString, $i, 1)
        If StringIsDigit($tempString) Then
            #cs ---------------------
             we need to add each digit onto the actual number as it appears in the string
             so the final string has each number listed individually.
             this do loop adds those extra digits if they're there
             also, some lines can have a entry like number3="4" or 3number="4".  We don't want the three
             there to be considered a number, so we'll skip it if the characters directly
             proceeding or following the number are letters (alpha characters)
            #ce ----------------------
            Do
                If (Not StringIsAlpha(StringMid($alphNumString, $i - 1, 1))) And (Not StringIsAlpha(StringMid($alphNumString, $i + 1, 1))) Then $numberString &= $tempString
                $i += 1
                $tempString = StringMid($alphNumString, $i, 1)
            Until Not StringIsDigit($tempString)
            $numberString &= " "
        EndIf
    Next
    $numberString = StringReplace($numberString, "  ", " ")
    ;remove the trailing space
    Return StringTrimRight($numberString, 1)
EndFunc

StringMakeAlpha: pulls all the "words" from the input string

#cs ---------------------
Very similar to stringStripAlpha except we're dumping everything but each
individual word in the string, i.e.
error action="1" item="C:\Documents and Settings\LocalService\Local Settings\Application Data\Microsoft\Windows\UsrClass.dat" code="32" time="2009-07-21T21:22:17" />
becomes:
error action item C Documents and Settings LocalService Local Settings Application Data Microsoft Windows UsrClass dat code time
Note:
Numbers that end the string (i.e. number3 or test100) will be attached to the word
Numbers that start a string currently do not get attached
#ce ---------------------
Func stringMakeAlpha($alphNumString)
    $tempString = ""
    $alphaString = ""
    For $i = 1 To StringLen($alphNumString)
        $tempString = StringMid($alphNumString, $i, 1)
        If StringIsAlpha($tempString) Then
            Do
                $alphaString &= $tempString
                $i += 1
                $tempString = StringMid($alphNumString, $i, 1)
            Until Not StringIsAlpha($tempString)
            If StringIsDigit($tempString) Then
                Do
                    $alphaString &= $tempString
                    $i += 1
                    $tempString = StringMid($alphNumString, $i, 1)
                Until Not StringIsDigit($tempString)
            EndIf
            $alphaString &= " "
        EndIf
    Next
    Return StringTrimRight($alphaString, 1)
EndFunc

So are these useful? Are the loops too slow? I can't recall how to test 0() levels on loops but this does iterate through the string only once, so it can't be that slow...

Share this post


Link to post
Share on other sites



Here's another way, with Regular Expressions :

Func stringStripAlpha($alphNumString)
    Local $aItems=StringRegExp($alphNumString,"(\d+)",3)
    If @error Then Return ""
    Local $sRet=""
    For $i=0 To UBound($aItems)-1
        $sRet&=$aItems[$i]&' '
    Next
    Return StringTrimRight($sRet,1)
EndFunc

Func stringMakeAlpha($alphNumString)
    Local $aItems=StringRegExp($alphNumString,"(?:^|\W)([[:alpha:]]+\d*)",3)
    If @error Then Return ""
    Local $sRet=""
    For $i=0 To UBound($aItems)-1
        $sRet&=$aItems[$i]&' '
    Next
    Return StringTrimRight($sRet,1)
EndFunc

>_<

Share this post


Link to post
Share on other sites

Quite pretty, but I don't know regexp that well. Can you explain how those work?

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Quite pretty, but I don't know regexp that well. Can you explain how those work?

Sure, I'll try. the first function uses this exp​ression:

(\d+)

The parentheses in this one are optional. Basically this one's pretty simple - it finds 1 or more digits in a string \d = digit, + = 1 or more. It stop searching at the first non-number, then puts it in an array element, then searches again for something that matches. (This is the behavior of option #3 - global matches)

The 2nd one uses:

(?:^|\W)([[:alpha:]]+\d*)

(?: ) means non-capturing group (the information is looked for, but not captured

^ means start-of-string, | means 'or', \W means any non-word character (see Help for what that means)

[] means match anything inside the group, [:alpha:] means match any alphabet character, + again means 1 or more

\d again means digit, but this time with * it is optional (* means 0 or more - so if none found, no biggy, if some are found, grab them all)

Hope that helps. Otherwise, yeah - you gotta do some Regular Exp​ression reading. They are hard to learn at first, but once you learn them, they are ADDICTING.

*edit: smiley popped up hehe

Edited by Ascend4nt

Share this post


Link to post
Share on other sites

Here is another example.

;
local $string =  'error action="1" item="C:\Documents and Settings\LocalService\Local Settings\Application Data\Microsoft\Windows\UsrClass.dat" code="32" time="2009-07-21T21:22:17" />'

MsgBox(0, "Extract Numeric & Non-numeric", "From: " & $string & @CRLF & @CRLF & _
        StringStripWS(StringRegExpReplace($string, "[^\d ]", "$1 "),7) & @CRLF & _ ; Extract all numbers
        StringStripWS(StringRegExpReplace($string, "[^[:alpha:] ]", "\1 "),7))     ; Extract all letters
;

The way I learnt about regular expression was by having the AutoIt help file open at StringRegExp, and changing the RE pattern of examples found on these forums to see what happens and why, logically. And I am still learning.

Share this post


Link to post
Share on other sites

@Malkey, good job - I KNEW there had to be a way with RegExpReplace, but actually the second one, he wanted the numbers that are tacked on to the end of words to be kept.. I'm too lazy to figure a RegExpReplace for that >_

Share this post


Link to post
Share on other sites

@Malkey, good job - I KNEW there had to be a way with RegExpReplace, but actually the second one, he wanted the numbers that are tacked on to the end of words to be kept.. I'm too lazy to figure a RegExpReplace for that >_<

The only reason I wanted that was if you were parsing some type of file that had entries like:

Number1 = 15

Or

123test = Alpha4

The full information is lost if you only parse for letters or numbers.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0