Jump to content
Sign in to follow this  
jennico

suggestion for a better _StringProper UDF

Recommended Posts

jennico

_StringProper is very slow. on large databases (e.g. renaming mp3s) you could save up to >90% time using a different algorithm.

here is a comparation:

_StringProper: checks every char in given text.

_StringProper1: splits given text into words using most common separators and "propers" entire words.

_StringProper2: splits given text into words using all possible separators by using StringRegExpression and "propers" entire words (slightly slower than _StringProper1).

the new suggestions save minimum 70% time on small texts up to >90% on small texts, especially if containing digits.

i would like to discuss function before making a feature request.

cheers j.

#AutoIt3Wrapper_AU3Check_Parameters= -d -w 1 -w 2 -w 3 -w 4 -w 5 -w 6

;#=#INDEX#==================================================================#
;#  Title .........: _StringProper compare
;#  Class..........: UDF + Example
;#  Date ..........: 3.11.08
;#  Author ........: jennico (jennicoattminusonlinedotde)
;#  Remarks .......: compares:
;#                   _StringProper (built-in original)
;#                   _StringProper1 (manual separators) min 4 times faster than original
;#                   _StringProper2 (all separators) min 4 times faster than original
;#==========================================================================#

While 1
    $txt = InputBox("Faster StringProper()", "Enter Text")
    If $txt = "" Then ExitLoop
    $timer = TimerInit()
    $res = _StringProper($txt)
    $t = TimerDiff(Round($timer))
    $timer = TimerInit()
    $res1 = _StringProper1($txt)
    $t1 = TimerDiff(Round($timer))
    $timer = TimerInit()
    $res2 = _StringProper2($txt)
    $t2 = TimerDiff(Round($timer))
    MsgBox(0, "Faster StringProper()", $res & " " & $t & @CRLF & _
        $res1 & " " & $t1 & @CRLF & $res2 & " " & $t2)
WEnd

#cs
    ;#=#Function#===========================================================#
    ;#  Name ..........: _StringProper($s_Str)
    ;#  Description....: Built-in original function
    ;#======================================================================#
#ce

Func _StringProper($s_Str)
    Local $CapNext = 1, $s_nStr, $s_CurChar
    For $iX = 1 To StringLen($s_Str)
        $s_CurChar = StringMid($s_Str, $iX, 1)
        If $CapNext = 1 Then
            If StringRegExp($s_CurChar, '[a-zA-ZÀ-ÿ]') Then
                $s_CurChar = StringUpper($s_CurChar)
                $CapNext = 0
            EndIf
        ElseIf StringRegExp($s_CurChar, '[a-zA-ZÀ-ÿ]') = 0 Then
            $CapNext = 1
        Else
            $s_CurChar = StringLower($s_CurChar)
        EndIf
        $s_nStr &= $s_CurChar
    Next
    Return ($s_nStr)
EndFunc   ;==>_StringProper

#cs
    ;#=#Function#===========================================================#
    ;#  Name ..........: _StringProper1($s)
    ;#  Description....: Changes a string to proper case, same as =Proper function in Excel
    ;#  Parameters.....: $s_Str Input string
    ;#  Return Value ..: Returns proper string.
    ;#  Remarks .......: This function will capitalize every character following a None Apha character.
    ;#                   alternative working with most common separators
    ;#  Author ........: jennico (jennicoattminusonlinedotde)
    ;#  Date ..........: 3.11.08
    ;#  Example........: yes
    ;#======================================================================#
#ce

Func _StringProper1($s)
    Local $t = StringSplit($s, " .,-()");enter all non alphanums here
    For $i = 1 To $t[0]
        If $t[$i] Then $s = StringReplace($s, $t[$i], StringUpper(StringLeft($t[$i], 1)) & _
            StringLower(StringMid($t[$i], 2)))
    Next
    Return $s
EndFunc   ;==>_StringProper1

#cs
    ;#=#Function#===========================================================#
    ;#  Name ..........: _StringProper2($s)
    ;#  Description....: Changes a string to proper case, same as =Proper function in Excel
    ;#  Parameters.....: $s_Str Input string
    ;#  Return Value ..: Returns proper string.
    ;#  Remarks .......: This function will capitalize every character following a None Apha character.
    ;#                   alternative working with all possible separators (slightly slower than _StringProper1)
    ;#  Author ........: jennico (jennicoattminusonlinedotde)
    ;#  Date ..........: 3.11.08
    ;#  Example........: yes
    ;#======================================================================#
#ce

Func _StringProper2($s)
    Local $p, $t = StringRegExp($s, "[^0-9a-zA-ZÀ-ÿ]", 3);covers all non alphanum chars
    For $i = 0 To UBound($t) - 1
        If StringInStr($p, $t[$i]) = 0 Then $p &= $t[$i]
    Next
    $t = StringSplit($s, $p)
    For $i = 1 To $t[0]
        If $t[$i] Then $s = StringReplace($s, $t[$i], StringUpper(StringLeft($t[$i], 1)) & _
            StringLower(StringMid($t[$i], 2)))
    Next
    Return $s
EndFunc   ;==>_StringProper2
Edited by jennico

Spoiler

I actively support Wikileaks | Freedom for Julian Assange ! | Defend freedom of speech ! | Fight censorship ! | I will not silence.OixB7.jpgDon't forget this IP: 213.251.145.96

 

Share this post


Link to post
Share on other sites
TheSaint

_StringProper is very slow. on large databases (e.g. renaming mp3s) you could save up to >90% time using a different algorithm.

i would like to discuss function before making a feature request.

I'd probably have to agree with you so far, but would add (like I said years ago) that we need three functions not one (or maybe one with added options).

i.e. Titlecase - where every word is capitalised. Sentence case - where only first letter of first word and names (etc) are capitalised. Propercase - where only words like 'and', 'the', 'or', 'as', etc are not capitalised.

Also when I create my functions, I always have to take into account names like McCaffrey and apostrophes like it's (where the 'S' always get capitalised). These are not the only examples of where the current function get's it wrong. Detecting case for more than one word, also does not work properly - you currently have to process every word individually ... that is not right!

:mellow:


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites
jennico

you are right, but remember you are talking about language specific problems. thus it is simply not possible to make a sentence case working unless you define all rules of any language.

so the only thing you can do for automation, is capitalize every word and leave the choice to the users if they want to specify the cases within the function by altering the function.

Detecting case for more than one word, also does not work properly - you currently have to process every word individually ... that is not right!

what do you mean by that ?

j.

maybe i should clarify my intention: my suggestions do not improve _StringProper, they only accelerate it (of course, this is an improvement in a way). i don't want to say that _StringProper is buggy. i think it is well done because it takes in account any foreign letters (at least in the western codepage). i am sure it does not work for cyrillic either !

Edited by jennico

Spoiler

I actively support Wikileaks | Freedom for Julian Assange ! | Defend freedom of speech ! | Fight censorship ! | I will not silence.OixB7.jpgDon't forget this IP: 213.251.145.96

 

Share this post


Link to post
Share on other sites
TheSaint

you are right, but remember you are talking about language specific problems. thus it is simply not possible to make a sentence case working unless you define all rules of any language.

so the only thing you can do for automation, is capitalize every word and leave the choice to the users if they want to specify the cases within the function by altering the function.

what do you mean by that ?

j.

maybe i should clarify my intention: my suggestions do not improve _StringProper, they only accelerate it (of course, this is an improvement in a way). i don't want to say that _StringProper is buggy. i think it is well done because it takes in account any foreign letters (at least in the western codepage). i am sure it does not work for cyrillic either !

There is probably something in what you say about languages, but I don't fully understand the relevance ... but that would be my lack of understanding the whole language conversion thingy. :)

I know they call the function _StringProper, but it should really be called _StringTitlecase ... because that is what it truly does (if however imperfectly). :(

I was talking about the _StringIsProper function ... try it out on more than one word (lowercase and uppercase functions have the same issue)(methinks). :mellow:

I'm aware that your improvement is only speed based, but I'm never one to leave an opportunity alone to promote further improvements. ;)

;)


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites
jennico

ah okay, but i cannot find a _StringIsProper function...

regarding the name, it is pointed out that it refers to a corresponding function in ms excel.

j.


Spoiler

I actively support Wikileaks | Freedom for Julian Assange ! | Defend freedom of speech ! | Fight censorship ! | I will not silence.OixB7.jpgDon't forget this IP: 213.251.145.96

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.