Jump to content

proposed change to UDF '_StringTitleCase()'


iCode
 Share

Recommended Posts

currently it looks like this...

; #FUNCTION# ====================================================================================================================
; Author ........: BrewManNH
; Modified ......:
; ===============================================================================================================================
Func _StringTitleCase($sString)
    Local $fCapNext = True, $sChr = "", $sReturn = ""
    For $i = 1 To StringLen($sString)
        $sChr = StringMid($sString, $i, 1)
        Select
            Case $fCapNext = True
                If StringRegExp($sChr, "[a-zA-Z\xC0-\xFF0-9]") Then
                    $sChr = StringUpper($sChr)
                    $fCapNext = False
                EndIf
            Case Not StringRegExp($sChr, "[a-zA-Z\xC0-\xFF'0-9]")
                $fCapNext = True
            Case Else
                $sChr = StringLower($sChr)
        EndSelect
        $sReturn &= $sChr
    Next
    Return $sReturn
EndFunc   ;==>_StringTitleCase

i am NO expert on RegEx, but there is a problem with the above line...

Case Not StringRegExp($sChr, "[a-zA-Z\xC0-\xFF'0-9]")

that will cause: 's (as in: that's) to become: 'S when the apostrophe is hex 2019:

suggested change is to simply add the missing char to the group, and possibly (*UCP)?:

Case Not StringRegExp($sChr, "(*UCP)[a-zA-Z\xC0-\xFF0-9'’]")

FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Link to comment
Share on other sites

Hey!

Look, no replies!

Maybe if I reply then someone will do the same.

"Just be fred, all we gotta do, just be fred."  -Vocaliod

"That is a Hadouken. A KAMEHAMEHA would have taken him 13 days and 54 episodes to form." - Roden Hoxha

@tabhooked

Clock made of cursors ♣ Desktop Widgets ♣ Water Simulation

Link to comment
Share on other sites

No, I saw way before the reply.

I think the change would be to add the hex value instead. Though I would need a guru of SRE to confirm the fix.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

UCP isn't required for this to work. Reminder: UCP is Unicode Character Properties and we don't rely on any property there. Beta (hence next release) has UCP support built-in, but v3.3.8.1 lacks it.

Placing the apostrophe in the character class is fine (requires UTF8 source file). Only catch if you want to make that universal (hint: you can't) is that different apostrophes are used depending on language. Also your "fix" will now turn O’Connor into O’connor.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I am still not keen on adding the apostrophe directly, as I guess most encode as ANSI.

Edit: Maybe I use ChrW()?

Edited by guinness

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

It's imposible to create unique rule that would produce the correct (expected) result for this function. For that reason function like this shouldn't exist at all as official UDF, at least not without disclaimer.

AutoIt doesn't force the use of any particular language for the users. The rules of AutoIt language syntax nor the rules of English language can't be applied to strings this function accepts. Even if you say _StringTitleCase accepts and processes only English words it would be impossible to create correct output without consulting both dictionary and specific grammar rules. Plus, both syntax and semantics dictates output. Best examples of that are the apostrophes, acronyms or ten other things that could be listed without thinking much. If you pass French sentence or Spanish or Portuguese, not to mention possibility of hundreds of other languages, you get mess. 

So either keep the algorithm very simple (Title-case) and add disclaimer to the help file or remove the damn function.

I think it's clear what I would do.

Edited by trancexx

♡♡♡

.

eMyvnE

Link to comment
Share on other sites

So either keep the algorithm very simple (Title-case) and add disclaimer to the help file or remove the damn function.

I think it's clear what I would do.

 

:thumbsup:  Keep it simple means include common British English (Associated Press) lower case exceptions.

;

a, an, and, as, at, but, by, for, from, in, into, nor, of, on, onto, or, per, so, the, to, up, via, with, yet

;

I believe this is the main set but there may also be more. :ermm:

Edited by czardas
Link to comment
Share on other sites

@DatMCEyeBall - i think you started an avalanche :)

my script is ANSI and i added the apostrophe without encoding it and it works perfectly

i agree that UCP doesn't need to be enabled for the EN language, but i thought that since the expression has a group with "a-zA-Z" in it, and UCP affects that, that it may be the safer thing to do?

i'm really not sure however since i'm no expert with RegEx and certainly no expert with Unicode!

FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Link to comment
Share on other sites

somebody told me that, decades ago, there was talk of a universal language... WHERE THE HELL IS IT!

FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Link to comment
Share on other sites

100,000 - 2,000,000 speakers - doesn't look too "universal", does it :)

that name does ring a bell though

FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Link to comment
Share on other sites

It's imposible to create unique rule that would produce the correct (expected) result for this function. For that reason function like this shouldn't exist at all as official UDF, at least not without disclaimer.

Remarks

This function will capitalize the first character of every word. Does not support unicode strings.

It already does have a disclaimer.

Without a thousand page document to look up any possible variation of usage of apostrophes, there's no way a single function could ever create a perfect title cased sentence (and this isn't truly a title case function, it's actually a Start Case function). This produces a lot better output than the older _StringProper function that's been in the String UDF for years though.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

It already does have a disclaimer.

Without a thousand page document to look up any possible variation of usage of apostrophes, there's no way a single function could ever create a perfect title cased sentence (and this isn't truly a title case function, it's actually a Start Case function). This produces a lot better output than the older _StringProper function that's been in the String UDF for years though.

good point, and because of that disclaimer and lack of support for unicode, i believe that strengthens my argument to incorporate the second type of apostrophe - it already has one and so the second is just another type of the first

sorry if i overstepped my bounds, i know this is a dev forum, and so i'll go away now :)

FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Link to comment
Share on other sites

my script is ANSI and i added the apostrophe without encoding it and it works perfectly

Then you're not using the Unicode character U+2019 as you pretend you do, but its ANSI counterpart (e.g. 0x92 in Windows Western charset). That only works on ANSI input.

i agree that UCP doesn't need to be enabled for the EN language, but i thought that since the expression has a group with "a-zA-Z" in it, and UCP affects that, that it may be the safer thing to do?

Again, UCP doesn't "affect that" at all.

:thumbsup:  Keep it simple means include common British English (Associated Press) lower case exceptions.

Simple shouldn't mean UK-centric!

It already does have a disclaimer.

That disclaimer is non-sensical. Every AutoIt string is Unicode. The disclaimer should mention "unaccentuated latin" or something like that.

good point, and because of that disclaimer and lack of support for unicode, i believe that strengthens my argument to incorporate the second type of apostrophe - it already has one and so the second is just another type of the first

sorry if i overstepped my bounds, i know this is a dev forum, and so i'll go away now :)

The question is not bounds, allowance, clearance but rather pragmatic thinking.

But in substance, what you say here is:

because lack of unaccentuated latin support (see remark above about Unicode) that you want to overcome, you propose to add an apostrophe character representation which will only work with ANSI input using one of the few Windows charsets which map this apostrophe to 0x92 ANSI?

It doesn't work much better the other way round: add support for the Unicode representation of apostrophe to overcome lack of "Unicode support".

Mind you, there are much more people using accentuated latin scripts (here script means written language) than people using english letters only. So if we want to cover the majority, we'd rather support a whole bunch of complex, contradictory rules to please this Babel. The problem is that we just can't.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

My comment was as well!

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...