Modify

Opened 8 years ago

Closed 8 years ago

#2173 closed Feature Request (Rejected)

_StringProper doesn't capitalize correctly

Reported by: BrewManNH Owned by:
Milestone: Component: Standard UDFs
Version: Severity: None
Keywords: _StringProper, titlecase Cc:

Description

The original _StringProper function doesn't correctly capitalize letters after some characters and numbers. If I use the original _StringProper on this string:

$String = "She's all that I,wAnt (10th place of 2)"

I get this as the output from it:

_StringProper($String) = She'S All That I,Want (10Th Place Of 2)

As you can see, the 2nd S in She's, the W in want and the T in 10th are all capitalized and they shouldn't be.

A simple addition to the RegEx in the function will correct this. By changing this

[[[
Select

Case $CapNext = 1

If StringRegExp($s_CurChar, '[a-zA-ZÀ-ÿšœžŸ]') Then

$s_CurChar = StringUpper($s_CurChar)
$CapNext = 0

EndIf

Case Not StringRegExp($s_CurChar, '[a-zA-ZÀ-ÿšœžŸ]')

$CapNext = 1

Case Else

$s_CurChar = StringLower($s_CurChar)

EndSelect
}}}
To this:

Select
	Case $CapNext = 1
		If StringRegExp($s_CurChar, "[a-zA-ZÀ-ÿ'šœžŸ0-9,]") Then
			$s_CurChar = StringUpper($s_CurChar)
			$CapNext = 0
		EndIf
	Case Not StringRegExp($s_CurChar, "[a-zA-ZÀ-ÿ'šœžŸ0-9,]")
		$CapNext = 1
	Case Else
		$s_CurChar = StringLower($s_CurChar)
EndSelect

It will give you the output below:

_StringProper($String) = She's All That I,want (10th Place Of 2)

Because the layout of this code on the page isn't formatting correctly, I have also attached a script showing the changes.

Attachments (3)

_StringProper.au3 (1.6 KB) - added by BrewManNH 8 years ago.
Modified version of _StringProper
___StringProper.au3 (950 bytes) - added by amarcruz 8 years ago.
_StringProper() enhanced version
StringProper_Test.rar (2.3 KB) - added by amarcruz 8 years ago.
_StringProper performance & functional tests.

Download all attachments as: .zip

Change History (13)

Changed 8 years ago by BrewManNH

Modified version of _StringProper

comment:1 Changed 8 years ago by BrewManNH

Correction to the code above, added a comma in there that shouldn't be there.

Here's the corrected code:

Select
	Case $CapNext = 1
		If StringRegExp($s_CurChar, "[a-zA-ZÀ-ÿ'0-9]") Then
			$s_CurChar = StringUpper($s_CurChar)
			$CapNext = 0
		EndIf
	Case Not StringRegExp($s_CurChar, "[a-zA-ZÀ-ÿ'0-9]")
		$CapNext = 1
	Case Else
		$s_CurChar = StringLower($s_CurChar)
EndSelect

comment:2 Changed 8 years ago by amarcruz

Hi, BrewManNH
English is not my lenguage, but I think "I,Was" is not correct.
I'm wrong?

Anyway, I write an enhanced version of _StringProper.
This is 40%-45% faster (I mean less slow :-)) and support things like "O'Brain", "McCartney", "L'Amoure", Roman Numerals (I, III, IV, etc), and lower letter recognized by IsLower().

The prefix list is "D'", "O'", "L'", and "Mc", but you know more about this, in Spanish (my language) this is not used.

Please, play the code, I'm attaching performance and functional tests.

Changed 8 years ago by amarcruz

_StringProper() enhanced version

Changed 8 years ago by amarcruz

_StringProper performance & functional tests.

comment:3 Changed 8 years ago by trancexx

BrewManNH are you feature-requesting a bug fix? Or am I reading this wrong?

comment:4 Changed 8 years ago by BrewManNH

I'm not sure if it's a bug, or just a limitation of how it is currently written.

It currently is described, in the help file, "same as =Proper function in Excel". If you put the same sentence into Excel with =Proper as the format for that cell, "she's" will come out the same as using _StringProper does, "She'S" and "10th" will come out as "10Th". So, it looks like it's working as it's described in the help file. So that's why I put it in as a feature request, to enhance the function to actually capitalize the first letter of each word. Although, there is a limitation that I didn't think about when I first wrote this, if there's a single quoted string in a string, the first letter of that quoted part won't get capitalized. Not sure how to go about fixing that limitation, due to the way that _StringProper checks the string one character at a time, and there's no way for it (currently) to know if the ' is at the beginning of a word, or inside of one.

BTW, the comma part that I mentioned in the initial post is incorrect, there's no reason for the W to not be capitalized, and that's why I posted the corrected version with the comma removed from the RegEx.

comment:5 Changed 8 years ago by BrewManNH

I have figured out the way to correctly capitalize a string that has within it, single or double quoted string so that the first letter after the single/double quote is capitalized, but if the quote mark is inside of a string it won't do that.

Here's the updated function and the original _StringProper function to compare the results.

$String = "'she's all 'that' I,wAnt(" & '1st "disk" of 2)'
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : __StringProper($String) = ' & __StringProper($String) & @crlf & '>Error code: ' & @error & @crlf) ;### Debug Console
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : _StringProper($String) = ' & _StringProper($String) & @crlf & '>Error code: ' & @error & @crlf) ;### Debug Console

Func __StringProper($s_String) ; modified _StringProper function, correctly capitalizes after ' and numbers
	Local $iX = 0
	Local $CapNext = 1
	Local $s_nStr = ""
	Local $s_CurChar
	For $iX = 1 To StringLen($s_String)
		$s_CurChar = StringMid($s_String, $iX, 1)
		Select
			Case $CapNext = 1
				If StringRegExp($s_CurChar, "[a-zA-Z\xC0-\xFF0-9]") Then
					$s_CurChar = StringUpper($s_CurChar)
					$CapNext = 0
				EndIf
			Case Not StringRegExp($s_CurChar, "[a-zA-Z\xC0-\xFF'0-9]")
				$CapNext = 1
			Case Else
				$s_CurChar = StringLower($s_CurChar)
		EndSelect
		$s_nStr &= $s_CurChar
	Next
	Return $s_nStr
EndFunc   ;==>_TitleCaseString
Func _StringProper($s_String)
	Local $iX = 0
	Local $CapNext = 1
	Local $s_nStr = ""
	Local $s_CurChar
	For $iX = 1 To StringLen($s_String)
		$s_CurChar = StringMid($s_String, $iX, 1)
		Select
			Case $CapNext = 1
				If StringRegExp($s_CurChar, '[a-zA-Z\xC0-\xFF]') Then
					$s_CurChar = StringUpper($s_CurChar)
					$CapNext = 0
				EndIf
			Case Not StringRegExp($s_CurChar, '[a-zA-Z\xC0-\xFF]')
				$CapNext = 1
			Case Else
				$s_CurChar = StringLower($s_CurChar)
		EndSelect
		$s_nStr &= $s_CurChar
	Next
	Return $s_nStr
EndFunc   ;==>_StringProper

The ONLY thing that needed to be changed was to remove the the single quote mark in the first Case statement in the Switch.

Here's the output from the script as it's written.

@@ Debug(2) : __StringProper($String) = 'She's All 'That' I,Want(1st "Disk" Of 2)
@@ Debug(3) : _StringProper($String) = 'She'S All 'That' I,Want(1St "Disk" Of 2)

As you can see, this string contains single and double quoted strings inside the string plus an apostrophe + s in the word She's, and as you can also see, this function correctly capitalizes everything that it should.

Instead of "fixing" the _StringProper function, you could add this as a _StringTitleCase function to the String.au3 collection, so that the original _StringProper isn't affected because it is working as it's described. Just a thought.

comment:6 Changed 8 years ago by amarcruz

I finished another version of _StringProper.
Please, run this code and tell me what you think:

#include <String.au3>
AutoItSetOption("MustDeclareVars", 1)
Const $aTest[7][2] = [ _
["She'S All 'That' i,Want (1St ""Disk"" Of 2)", "She's All 'That' I,Want (1st ""Disk"" Of 2)"], _
["“thank god it's christmas” from greatest hits iii", "“Thank God It's Christmas” From Greatest Hits III"], _
["5o. o 5ºbatallón -sedena", "5o. O 5ºBatallón -Sedena"], _
["AT&T 70'S and 80'S", "At&T 70's And 80's"], _
["o'bRian/pÉrez-macmillan [mCgraw-hill]", "O'Brian/Pérez-MacMillan [McGraw-Hill]"], _
["»hello, i'm here, you're there", "»Hello, I'm Here, You're There"], _
["'boo@fear.com' of usa", "'Boo@Fear.Com' Of USA"] _
]
Local $s1, $s2, $s3, $sErr2, $sErr3, $nErrors = 0

For $ix = 0 To UBound($aTest)-1
    $s1 = $aTest[$ix][0]
    $s2 = _StringProper($s1)
    $s3 = _StringProper2($s1)
    If $s2 == $aTest[$ix][1] Then
        $sErr2 = ' >'
    Else
        $sErr2 = '!>'
    EndIf
    If $s3 == $aTest[$ix][1] Then
        $sErr3 = ' >'
    Else
        $sErr3 = '!>'
    EndIf
    _DebugOut('** Source string: '& $s1 & @CRLF _
      & $sErr2 &' _StringProper: '& $s2 & @CRLF _
      & $sErr3 &'_StringProper2: '& $s3)
    If $sErr3 == "!>" Then
        $nErrors += 1
        _DebugOut('- ERROR! must be: '& $aTest[$ix][1])
    EndIf
Next
_DebugOut("Done, "& $nErrors &" error(s) in _StringProper2")
Exit 0

Func _DebugOut($sx)
    ConsoleWrite($sx & @CRLF)
EndFunc

Func _StringProper2($sIn)
    Static Local $sRE = "(?:Mc|Mac|°|º)([[:lower:]])|U(sa)\b|[IVXLCDM]([ivxlcdm]+)\b"
    Local $bCapNext = True
    Local $sOut = ""
    Local $aStr = StringSplit(StringLower($sIn), "")
    Local $nLen = $aStr[0]
    Local $ix, $ch

    For $ix = 1 To $nLen
        $ch = $aStr[$ix]
        Select
            Case StringIsAlNum($ch)
                If $bCapNext Then
                    $ch = StringUpper($ch)
                    $bCapNext = False
                EndIf
            Case $ch == "'"
                $bCapNext = $bCapNext Or StringInStr("DOLM", StringRight($sOut,1))
            Case Else	; not StringIsAlnum($ch)
                $bCapNext = True
        EndSelect
        $sOut &= $ch
    Next

    ; Now proccess Mc* Mac* and roman numerals
    $aStr = StringRegExp($sOut, $sRE, 1)
    While @error = 0
        $ix = @extended
        $ch = $aStr[UBound($aStr)-1]
        If StringIsUpper($ch) Then
            $ch = StringLower($ch)
        Else
            $ch = StringUpper($ch)
        EndIf
        $sOut = StringLeft($sOut, $ix-(StringLen($ch)+1)) & $ch & StringMid($sOut, $ix)
        $aStr = StringRegExp($sOut, $sRE, 1, $ix)
    WEnd

    Return SetError(0, $nLen, $sOut)
EndFunc   ;==>_StringProper2

comment:7 Changed 8 years ago by jchd

My (simpler) take works for full Unicode range without locale specials:

_ConsoleWrite(_StringProper("o'connor is a common name") & @LF)
_ConsoleWrite(_StringProper("Ưňįĉōďę ŞŤŔĬŃĜ ḁḃḇḘḛ ზლსყຝມລຫ ύϑϔήΛΞ ijɨɗɛȡȘȌ ᾠᾍὤώᾄῆ ⍴⍵⍶ ⍺ ђфЯЙӨиа") & @LF)
_ConsoleWrite(__StringProper("o'connor is a common name") & @LF)
_ConsoleWrite(__StringProper("Ưňįĉōďę ŞŤŔĬŃĜ ḁḃḇḘḛ ზლსყຝມລຫ ύϑϔήΛΞ ijɨɗɛȡȘȌ ᾠᾍὤώᾄῆ ⍴⍵⍶ ⍺ ђфЯЙӨиа") & @LF)


; #FUNCTION# ====================================================================================================================
; Name...........: __StringProper
; Description ...: Changes a string to proper case, same a =Proper function in Excel
; Syntax.........: __StringProper($s_String)
; Parameters ....: $s_String - Input string
; Return values .: Success - Returns proper string.
;                  Failure - Returns "".
; Author ........: Jos van der Zande <jdeb at autoitscript dot com>
; Modified.......: jchd
; Remarks .......: This function will capitalize every character following a non-apha character.
; Related .......:
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func __StringProper($s_String)
	Local $CapNext = 1
	Local $s_nStr
	Local $s_CurChar
	$s_String = StringLower($s_String)
	For $iX = 1 To StringLen($s_String)
		$s_CurChar = StringMid($s_String, $iX, 1)
		If StringIsAlpha($s_CurChar) Then
			If $CapNext Then
				$s_CurChar = StringUpper($s_CurChar)
				$CapNext = 0
			EndIf
		Else
			$CapNext = 1
		EndIf
		$s_nStr &= $s_CurChar
	Next
	Return $s_nStr
EndFunc   ;==>__StringProper

Func _ConsoleWrite($sString)
	Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", 65001, "dword", 0, "wstr", $sString, "int", -1, _
								"ptr", 0, "int", 0, "ptr", 0, "ptr", 0)
	If @error Then Return SetError(1, @error, 0)
	Local $tText = DllStructCreate("char[" & $aResult[0] & "]")
	$aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", 65001, "dword", 0, "wstr", $sString, "int", -1, _
							"ptr", DllStructGetPtr($tText), "int", $aResult[0], "ptr", 0, "ptr", 0)
	If @error Then Return SetError(2, @error, 0)
	ConsoleWrite(DllStructGetData($tText, 1))
EndFunc

Set Scite console to Unicode (65001) for correct display.

If you embark in dealing with Mc* and Mac* then you'll have to properly deal with specifics for every language/script in the world (which is MUCH larger than US, UK, IE, AU, NZ, ...). Roman numerals are way too ambiguous: what if I really call Cliv Mill?

Also folks, please note that titlecase != propercase

comment:8 Changed 8 years ago by BrewManNH

Your function is definitely better for Unicode characters. I have no experience with capitalization of Unicode, so I just took the original function and modified it.

I agree that mine was more in line with a Title Case function than with the Proper function, and mentioned that in my last update. If anyone wants to add it to the string.au3 as _StringTitleCase instead of replacing _StringProper, feel free to treat this as a feature request for that instead.

Although, in my opinion, the current version of _StringProper is rather limited in it's usage because of it's inferior output when dealing with real world strings. After all, what's the point of using proper on a string like "she's all 'that' I, want" and having it come out the way that _StringProper outputs it? Just my 2 cents on it.

If there were some way to modify the version I posted here that would allow it to deal with unicode, then of course, that should be done.

comment:9 Changed 8 years ago by trancexx

What happened here?

Is this ticket still a request? If it is then what's it for? For changes to existing function or for adding a new one?

comment:10 Changed 8 years ago by trancexx

  • Resolution set to Rejected
  • Status changed from new to closed

Make up your mind next time. Before making a request.

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as closed The ticket will remain with no owner.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.