Jump to content

UDF for Title Case, Initial Caps, and Sentence Case


tcurran
 Share

Recommended Posts

This UDF is intended to improve on the existing _StringProper UDF by using the magic of regular expressions (don't worry, it protects the user from having to know anything about RegEx). In addition to avoiding some of the anomalies associated with _StringProper (such as a cap following each apostrophe--Susan'S Farm), it provides additional functionality:

  • Sentence Case: Only the first word of each sentence (following a period, question mark, exclamation point or colon) is capitalized.
  • Title Case: Initial caps for all words except articles (a, the) and some common conjunctions (and, but) and prepositions (in, on). While the actual rules for capitalizing authored works would require hundreds of lines of code and still not be perfect, this will suffice for most uses.
  • Capitalization Exceptions: Permits user selectable exceptions to the default scheme. Mac (as in MacDonald), Mc, and O' are the defaults, but users can pass their own exceptions in an easy-to-use function parameter.
I've chosen not to use the term "Proper Case" in the function at all, because a) there are varying opinions about what it means, b ) my equivalent (termed "Initial Caps") works somewhat differently (i.e. better :shifty: ), and c) "Proper Case" as used in other applications (e.g. Excel) works (or doesn't work) the same as _StringProper in AutoIt.

I'm posting _StringChooseCase here in hopes of getting some feedback and squashing any bugs I've missed prior to submitting it as a candidate for inclusion as a standard AutoIt UDF.

UPDATE (3 Jan 2013): I removed the hack noted below using a more bullet-proof method of marking capitalization exceptions, inspired by dany's _StringRegExpSplit function. Also added the colon character as sentence punctuation, and added II, III & IV as default cap exceptions.

UPDATE (9 Jan 2013): The code is a hair more efficient. #include-once and #include <array.au3> now appear where they're supposed to. "I" is now always capitalized (both as the first person pronoun and the Roman numeral one). Title Case further improved: It now has a more comprehensive list of lower-case words--mainly more prepositions--and the last word of a title will always be capitalized.

#include-once
#include <Array.au3> ;_ArrayToString UDF used in Return

; #FUNCTION# ====================================================================================================================
; Name...........: _StringChooseCase
; Description ...: Returns a string in the selected upper & lower case format: Initial Caps, Title Case, or Sentence Case
; Syntax.........: _StringChooseCase($sMixed, $iOption[, $sCapExcepts = "Mc^|Mac^|O'^|II|III|IV"])
;PROSPECTIVE: add param for Ignore mixed case input
; Parameters ....: $sMixed -           String to change capitalization of.
;                 $iOption -          1: Initial Caps: Capitalize Every Word;
;                                    2: Title Case: Use Standard Rules for the Capitalization of Work Titles;
;                                    3: Sentence Case: Capitalize as in a sentence.
;                 $sCapExcepts    -  [optional] Exceptions to capitalizing set by options, delimited by | character. Use the ^
;                                    character to cause the next input character (whatever it is) to be capitalized
; Return values .: Success - Returns the same string, capitalized as selected.
;                 Failure - ""
; Author ........: Tim Curran <tim at timcurran dot com>
; Remarks .......: Option 1 is similar to standard UDF _StringProper, but avoids anomalies like capital following an apostrophe
; Related .......: _StringProper, StringUpper, StringLower
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================


Func _StringChooseCase(ByRef $sMixed, $iOption, $sCapExcepts = "Mc^|Mac^|O'^|I|II|III|IV")
    Local $asSegments, $sTrimtoAlpha, $iCapPos = 1
    $sMixed = StringLower($sMixed)
    Switch $iOption
        Case 1 ;Initial Caps
            $asSegments = StringRegExp($sMixed, ".*?(?:\s|\Z)", 3) ;break by word
        Case 2 ;Title Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\s|\Z)", 3) ;break by word
        Case 3 ;Sentence Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\.\W*|\?\W*|\!\W*|\:\W*|\Z)", 3) ;break by sentence
    EndSwitch
    Local $iLastWord = UBound($asSegments) - 2
    For $iIndex = 0 to $iLastWord ;Capitalize the first letter of each element in array
        $sTrimtoAlpha = StringRegExp($asSegments[$iIndex], "\w.*", 1)
        If @error = 0 Then $iCapPos = StringInStr($asSegments[$iIndex], $sTrimtoAlpha[0])
        If $iOption <> 2 Or $iIndex = 0 Then ;Follow non-cap rules for Title Case if option selected (including cap last word)
            $asSegments[$iIndex] = StringReplace($asSegments[$iIndex], $iCapPos, StringUpper(StringMid($asSegments[$iIndex], $iCapPos, 1)))
        ElseIf $iIndex = $iLastWord Or StringRegExp($asSegments[$iIndex], "\band\b|\bthe\b|\ba\b|\ban\b|\bbut\b|\bfor\b|\bor\b|\bin\b|\bon\b|\bfrom\b|\bto\b|\bby\b|\bover\b|\bof\b|\bto\b|\bwith\b|\bas\b|\bat\b", 0) = 0 Then
            $asSegments[$iIndex] = StringReplace($asSegments[$iIndex], $iCapPos, StringUpper(StringMid($asSegments[$iIndex], $iCapPos, 1)))
        EndIf
        ;Capitalization exceptions
        $asSegments[$iIndex] = _CapExcept($asSegments[$iIndex], $sCapExcepts)
    Next
    Return _ArrayToString($asSegments, "")
EndFunc ;==> _StringChooseCase

Func _CapExcept($sSource, $sExceptions)
    Local $sRegExaExcept, $iMakeUCPos
    Local $avExcept = StringSplit($sExceptions, "|")
    For $iIndex = 1 to $avExcept[0]
        $sRegExaExcept = "(?i)\b" & $avExcept[$iIndex]
        $iMakeUCPos = StringInStr($avExcept[$iIndex], "^")
        If $iMakeUCPos <> 0 Then
            $sRegExaExcept = StringReplace($sRegExaExcept, "^", "")
        Else
            $sRegExaExcept &= "\b"
        EndIf
        $avExcept[$iIndex] = StringReplace($avExcept[$iIndex], "^", "") ;remove ^ from replacement text
        $sSource = StringRegExpReplace($sSource, $sRegExaExcept, $avExcept[$iIndex])
        If $iMakeUCPos <> 0 Then
            Local $iNextUC = _StringRegExpPos($sSource, $sRegExaExcept)
            Local $iMatches = @extended
            Local $iCapThis = $iNextUC + $iMakeUCPos
            For $x = 1 to $iMatches
                $sSource = StringLeft($sSource, $iCapThis - 2) & StringUpper(StringMid($sSource, $iCapThis - 1, 1)) & StringMid($sSource, $iCapThis)
            Next
        EndIf
    Next
    Return $sSource
EndFunc ;==> _CapExcept

Func _StringRegExpPos($sTest, $sPattern, $iOcc = 1, $iStart = 1)
    Local $sDelim, $iHits
    If $iStart > StringLen($sTest) Then Return SetError(1)
    ;Delimiter creation snippet by dany from his version of _StringRegExpSplit
    For $i = 1 To 31
        $sDelim &= Chr($i)
        If Not StringInStr($sTest, $sDelim) Then ExitLoop
        If 32 = StringLen($sDelim) Then Return SetError(3, 0, 0)
    Next
    Local $aResults = StringRegExpReplace(StringMid($sTest, $iStart + (StringLen($sDelim) * ($iOcc - 1))), "(" & $sPattern & ")", $sDelim & "$1")
    If @error = 2 Then Return SetError(2, @extended, 0)
    $iHits = @extended
    If $iHits = 0 Then Return 0
    If $iOcc > $iHits Then Return SetError(1)
    Local $iPos = StringInStr($aResults, $sDelim, 0, $iOcc)
    SetExtended($iHits)
    Return $iStart - 1 + $iPos
EndFunc ;<== _StringRegExpPos

Here's a bit of sample code:

EDIT (16 Jan 2013): Corrected format of #include to use quotation marks instead of angle brackets.

#Include "_StringChooseCase.au3"

Global $test = "'abcdefghi now it's 'the time for all good men.' 'AND TWELVE MORE MACDONALD'S!'" & @CRLF & "The quick brown fox JUMPED over the lazy MacDonalds. The USA's Usain Bolt ran for the USA."
ConsoleWrite(_StringChooseCase($test, 1, "Mc^|Mac^|O'^|USA|FBI|Barack|Obama") & @CRLF)
ConsoleWrite(_StringChooseCase('"and the band played on"', 2) & @CRLF)

Previous downloads: 18

_StringChooseCase.au3

Edited by tcurran
Link to comment
Share on other sites

Thanks for posting.

UDF List:

 
_AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Oops, I forgot to include the colon character in the Sentence Case code. I'll put it in with the next revision.

Also, I neglected to point out one notable hack I employed: I marked the position of mid-word letters that need Capitalization Exceptions by following them with the very, very rarely used cedilla character ¸ (which looks like a comma, but isn't). If, for some freakish reason, your text includes the cedilla AND has a mid-word cap that's on the exception list (like the name McDougal), you may get unexpected results. Somehow I doubt this will ever happen to anyone, but you never know.

And by the way, this code may or may not work correctly in languages other than English. I'd like to hear about it if it doesn't.

Edited by tcurran
Link to comment
Share on other sites

@tcurran - Excellent, Thanks for sharing. Seems a well organized UDF.

Don't know if you are aware of other similar attempts or discussions here?

was mine, though I've long since made improvements, which I must upload. You may glean something.

Been a pet topic of mine for some years it has ... with my function(s) evolving over time with use.

Another short discussion

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

Don't know if you are aware of other similar attempts or discussions here?

Yes, indeed, I believe I read every message ever written on this forum about _StringProper before I started, and I gained inspiration from the previous attempts by you and jennico—although I obviously took my own approach.

I didn't feel it was worth the processing overhead to include a Roman numerals function, although I'm open to other views on that. In any case, I'm thinking in hindsight that it might be good to include II, III, and IV alongside Mc, Mac and O' to take care of all the most common cap exceptions that appear in American and Commonwealth surnames.

Link to comment
Share on other sites

If one wants to be completest, there are many short names out there that need capitalization.

BBC, ABC, NBC, USA, UK, DVD, CD, USB, AC/DC (or AC-DC when a filename), etc etc

But it takes time to recall them all, and no doubt new ones also turn up.

This is where an add feature (registry?) would come in handy.

Of course, they require all the right checks - start with, end with, space/s, brackets, punctuation.

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

If one wants to be completest, there are many short names out there that need capitalization.

BBC, ABC, NBC, USA, UK, DVD, CD, USB, AC/DC (or AC-DC when a filename), etc etc

Exactly! The complete list would run to the thousands, which would create quite a bit of processing overhead. Checking every word in the source against every word in a comprehensive exception list would mean hundreds of thousands or even millions of runs through the exception function. You'd need a much better search algorithm than this simple brute force method.

But different scripters will use this function for different things. In some contexts, a user might not want to capitalize AM or PM, for instance. So for simplicity's sake, the defaults assume you'll be using it on a list of English-language names—period—and more by way of example than any effort to be comprehensive. But someone might use it to capitalize news copy. Someone else might use it on scanned legal documents. That's why there's a parameter to pass your own customized exception list. If you needed a really long list (and were willing to pay the performance price) It would be trivial to revise this code to load a text file into the $sCapExcepts parameter.

My brief here is coding. I'll leave dictionary building to someone else.

Edited by tcurran
Link to comment
Share on other sites

Exactly! The complete list would run to the thousands, which would create quite a bit of processing overhead. Checking every word in the source against every word in a comprehensive exception list would mean hundreds of thousands or even millions of runs through the exception function. You'd need a much better search algorithm than this simple brute force method.

I very much doubt it would be anywhere near as large as that, as my list of names is probably about three times what I provide above, and something that I've built up over years. While I guess in a sense there is potential for the list to be huge, in reality I've found it to be small, but then it would depend on your general use I suppose. For instance, in my case (ha ha pun), I don't use it for letter writing or anything long or complex. I generally stick to Movie, Music & Book titles and similar, that are often path related.

While people can indeed employ your UDF for a whole range of things text related, I look at it from a general use perspective, that involves Media and File/Folder names, which is much more limited in range of replacements.

But different scripters will use this function for different things. In some contexts, a user might not want to capitalize AM or PM, for instance. So for simplicity's sake, the defaults assume you'll be using it on a list of English-language names—period—and more by way of example than any effort to be comprehensive. But someone might use it to capitalize news copy. Someone else might use it on scanned legal documents. That's why there's a parameter to pass your own customized exception list. If you needed a really long list (and were willing to pay the performance price) It would be trivial to revise this code to load a text file into the $sCapExcepts parameter.

Very true, and in my case, I would (as above) have something like a Media list that would be checked for that type of data use. I already do something similar with my current (main) function, that has several switches for processing all sorts of replacements, from foreign language characters/accents to Roman Numerals, illegal file name characters, ASCII alternatives, etc. Many of those are pretty standard and could be applied as parameter/switches in the function call.

My brief here is coding. I'll leave dictionary building to someone else.

That's perfectly ok, as you've provided a means and platform ... not that it's my place to judge you at all anyway. :D

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

@TheSaint

Well, you did say, 'to be completest.' I should have noticed your list suggested you were thinking of movie and CD titles. But a 'complete' list would have to include every organization known commonly by its initials (FBI, CIA, NAACP, AARP and hundreds if not thousands more); every locality abbreviated by its initials (SF, TX); every unusually capitalized surname (diGenova), technology (FTP, iPad, iOS, LaTeX, SciTE(!)) or company name (FedEx, M&M/Mars); medical abbreviations; and on and on and on. A 'complete' list really would run to the thousands.

Interestingly, while thinking about this, I came up with a method (using RegEx, of course) for feeding bulk texts with correct capitalization into an AutoIt script and getting out a file consisting of nothing but all the capitalization exceptions contained in the text. Feed in a few dictionaries and directories, and get back a comprehensive exception list! Not sure what you'd do with such a list, since—as I said—you'd need a much better search algorithm than the one in _StringChooseCase to run large texts against a large exception list efficiently. Anyhow, that's a project for another day.

I'd love to see your title correction function. It might provide inspiration for improvements.

Link to comment
Share on other sites

Well, you did say, 'to be completest.' I should have noticed your list suggested you were thinking of movie and CD titles. But a 'complete' list would have to include every organization known commonly by its initials (FBI, CIA, NAACP, AARP and hundreds if not thousands more); every locality abbreviated by its initials (SF, TX); every unusually capitalized surname (diGenova), technology (FTP, iPad, iOS, LaTeX, SciTE(!)) or company name (FedEx, M&M/Mars); medical abbreviations; and on and on and on. A 'complete' list really would run to the thousands.

Yes, it would be huge. I prefer my targeted approach and deliberately ignore mixed 'i' names like iPod, etc ... doesn't do to think too heavily about them ... headache material. With my music titles, all artist/group names are in capitals, to make life easy. For books/stories Authors are the same. Movie names, likewise, are also in uppercase or a combo of upper and title case (i.e. Star Wars - EMPIRE STRIKES BACK).

Interestingly, while thinking about this, I came up with a method (using RegEx, of course) for feeding bulk texts with correct capitalization into an AutoIt script and getting out a file consisting of nothing but all the capitalization exceptions contained in the text. Feed in a few dictionaries and directories, and get back a comprehensive exception list! Not sure what you'd do with such a list, since—as I said—you'd need a much better search algorithm than the one in _StringChooseCase to run large texts against a large exception list efficiently. Anyhow, that's a project for another day.

Intriguing to think about possible uses.

I'd love to see your title correction function. It might provide inspiration for improvements.

I'll see what I can dig up, but it may prove tricky to get a full comprehensive version, as I've tended to update various instances of my function as it exists in various programs I've made. Slack I know, but I've tended to update on the fly at need (usually in the midst of something else that is my primary focus) ... with the intention of course, to collate them all together when I get the chance one day. Some on their own are fairly comprehensive though ... like ones I use a lot - my Update Mp3 Artwork and CDIni Database programs. I'm definitely not as organized as I should be though ... which applies to most of my coding practices ... no longer religious with my snippets as I once used to be, as I now tend to just snatch & grab from my wide assortment of program scripts, with locations implanted as some kind of map in my mind. Luckily for me, most of the functions I've ever created, tend to exist in my 19,453 line CDIni Database script or one of it's addons.

My Update Mp3 Artwork program actually refers to line and pipe separated entries in a text file, when processing ID3 tags, as well as using my regular Titlecase function(s) with switches. A lot of case changes are personal preferences though.

I'm currently on my Netbook (primary downloading machine) at the moment, which I don't do any programming on, so AutoIt isn't even installed, so I don't have quick access to my scripts. I tend to always be on my Netbook when I visit the Forum these days, while waiting for a download to finish until I can organize the next one. My main PC (Laptop at the moment, while my main Desktop is awaiting rebirth) is usually flat out doing video conversion, music stuff and my occasional programming (updates to existing scripts more than anything new these days).

It shouldn't take too long to cobble something together ... though I assure you, you have a better handle on using RegEx than I do, so you're bound to be disappointed at my simplistic coding for myself backyard hobby programmer approach. o:)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

Not all Roman Numerals are catered for. I and V are affected in that they are always capitalized. I was wondering if it might be an idea to leave them alone altogether. Some people write Roman Numerals in lower case:

$sString = "iii. third item, iv. fourth item"
MsgBox(0, "", _StringChooseCase($sString, 1))

I also see another conflict: MC may occur in McDonnald, but it is also the Roman numeral for 1100. Just something you might want to consider.

Edited by czardas
Link to comment
Share on other sites

@czardas

It's functionally impossible (without some kind of pretty good artificial intelligence) to develop a capitalization function that works in all cases. Even if you had a complete capitalization dictionary, the author's intentions and context come into play:

  • Is this a first-level roman numeral (IV) or second-level (iv)?
  • Is the input a title or a sentence?
  • Is the word a noun (charity) or a person's name (Charity)?
  • Is the word a 'capitonym' (August, august; Cancer, cancer; Catholic, catholic), where the intended definition dictates whether a word is capitalized?
  • etc etc etc
In practical terms, the best a UDF scripter can do is make it easy for developers to provide their own exception list, appropriate for the use of the function in their scripts. Because capitalizing a list of names is a common use case, and because it's helpful to have an in-built example, the _StringChooseCase default assumes the function will be used for proper name capitalization. If you want to use it for something else (e.g. putting an outline into title case), you could and probably should provided a different exception list, following the format of the default list.

So, in the case of your example, I would re-write it this way:

$sString = "iii. third item, iv. fourth item"
MsgBox(0, "", _StringChooseCase($sString, 1, "i|ii|iii|iv|v|vi|vii|viii|ix|x|xi|xii")) ;include Roman numerals up to max in outline

Not the most elegant approach, I admit. But the alternatives aren't much prettier: Some Roman numerals are capitalized, some aren't--how do you distinguish?; "I" will always be ambiguous as to whether it is the Roman numeral one or the first-person pronoun; should there be some special regex code to distinguish a Roman number like MCMIX (1909) from a British name like McMillan? It starts to get silly. So I'm still pretty sure the best approach is letting the scripter decide the exception list based on the intended use of the function.

Link to comment
Share on other sites

I agree with you. I currently am attempting to create a unicode compatible version which will automatically ignore words which already contain capitals, digits or web links, since they are not legitimate words and possibly case sensitive.

Edited by czardas
Link to comment
Share on other sites

This sounds like a good project. Just remember that one frequent use for a such a function is to convert input from ALL CAPS to sentence or title case. Wouldn't an ALL CAPS word be ignored by your function (since it already contains capitals)? Or do you envision completely different uses between your function and mine?

Link to comment
Share on other sites

After some research, I have made a list of what I consider to be the most important exceptions to capitalization in British English titles. Someone may want to comment about, or add to, this list. I think these lower case words will be correct most of the time.

a, an, and, as, at, but, by, for, from, in, into, nor, of, on, onto, or, per, so, the, to, up, via, with, yet

I have been trying out a few more ideas. :)

Edited by czardas
Link to comment
Share on other sites

@czardas - If you want to use lower case for Roman Numerals, that's fine, the problem is, that when _Propercase gets applied, they get a leading capital. So if you need to code for one, you need to code for the other, which can just be a preference switch.

Have you looked at my code for Roman Numerals?

It works quite well, and the few errors it might make, are quite acceptable, especially compared to the errors without, which occur more regularly.

You just can't cater for all scenarios, so you go for the one that saves editing the most.

As far as MC goes, you need to set a judicious preference.

The whole idea of improvements to _Propercase, like Titlecase, is that way too many errors currently crop up, so you are aiming at reducing them to the point they don't happen most of the time or often. This is what so many fail to see. I rarely need to edit now, and half the time when I do, it's because I overlooked something in my code, which hence forward gets even better with less errors each time.

Those title exceptions you list are fine in a sentence or a description, but any true title is capitals for each and every word.

I ever so hate it when I come across something like a song, book or movie title that has uncapitalized words like - and, with, for ... etc. They just don't look right. They look like a sentence, etc. A title is something that should stand out as a complete whole.

The moment you break a title up with lower case words, it no longer looks whole.

For instance, imagine the following book title.

I Went To The Market By Lofty Waters

which is a whole other thing if it is

I Went To The Market by Lofty Waters

You should be left in no doubt by the second version.

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

The way I interpret the example you gave is that the author is not part of the title. The title is:

I Went to the Market

I believe this would be considered correct in British English: ie not my English or your English.

Lower case Roman Numerals are impossible to distinguish from words. There are differences of opinion regarding the capitalization of certain words: see link. I learned that generally words of more than three letters should be capitalized regardless of word type. The four letter variants above are consistant with some of the two letter words in the above set.

It's not...Gone With The Wind'

It's...'Gone with the Wind'

Edited by czardas
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...