Jump to content

UDF for Title Case, Initial Caps, and Sentence Case


tcurran
 Share

Recommended Posts

@czardas

Wow, reading that link made my head swim. I can't imagine that there would ever be a function that will ever be able to capitalize any sentence/name without a look up table and a language reference book beside you. :)

Just looking at how the same name sections are capitalized differently depending upon where the person is from or where you're at would make it nearly impossible to make a one-size-fits-all function.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

@tcurran: I don't mean to hijack your thread, it's very inspirational. Thank you!

@BrewmanNH: Yeah, that's why I decided on the minimal-changes approach. All I need to do is add some options to include or exclude certain parameters, now that I have fixed the initial bugs. It should be quite straight forward.

Edit: It was. B)

Edited by czardas
Link to comment
Share on other sites

Those title exceptions you list are fine in a sentence or a description, but any true title is capitals for each and every word.

I ever so hate it when I come across something like a song, book or movie title that has uncapitalized words like - and, with, for ... etc. They just don't look right. They look like a sentence, etc.

@TheSaint

I can appreciate that, in a sense, this is a matter of taste. However, there are rules for title capitalization (see The Chicago Manual of Style, Strunk & White's Elements of Style and many other reference works), and they do dictate that, in general, articles, prepositions and conjunctions are lower case (except when the first word in a title, or a meaningful, principal word).

The actual rules are very complex and run to about a page in The Chicago Manual of Style. They'd be impossible to code for because they depend on context and value judgements. That's why Title Case mode in uses a subset of articles, prepositions and conjunctions that are most likely to be lower-case in a title (that follows the rules of capitalization).

@czardas @TheSaint

The correct British and Amercan version of the title would be:

"I Went to the Market by Lofty Waters" (by Charles Smith)

I think what @TheSaint was getting at was that "by Lofty Waters" was somehow ambiguous as to whether that was part of the title or the name of the author, and that all initial caps somehow clears it up. But in fact, the rules of formal English dictate that "by" is supposed to be lower case, and the ambiguity is supposed to be cleared up with the use of quotation marks (i.e. inverted commas) or italics.

Actually, @TheSaint is sort of right. Under Chicago Manual of Style rules, his example probably is a rare case where "By" should be capitalized, not for the reason he gives, but because it's what the book calls a "stressed" word. The authors use the example of "A River Runs Through It," where "Through" is capitalized, even though it's a preposition, because it's stressed.

Edited by tcurran
Link to comment
Share on other sites

Go back far enough, and there wasn't any concept of capitalization. We have to deal with today, and there's no UDF that will ever be produced that will correctly capitalize anything the exact way everyone wants it to be, there's just too many opinions as to what is "right". You have to go with a consensus and/or you're own opinions on what "right" should be and hope someone likes it.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Well rule are rules I guess, but it sure looks & sounds stupid to me, and I'd like to know their reasoning.

I gave my reasoning, and it all seems quite logical to me, especially from a visual standpoint, and especially where you want things to standout, which is something a title is supposed to do.

I don't believe in the rules in this instance, and so will disregard what seems to me just some smartarse preference that doesn't make sense. We Aussies are well known for bucking the system when it's built on a tradition and doesn't make sense.

Pushing the boundaries, growth, making things smarter & better ... all that stuff!

Just guess I'm being my renegade alter ego!

But then I always did see rules as being more like guidelines ... less robotic and more human that way.

EDIT

No doubt, that makes me sound like some kind of illiterate thug, but really I'm just getting rid of all that class nonsense, which is what many traditional things are based on. Don't believe in lesser or greater words, they are all equal in my mind. Simple is best, without a pressing reason to add complexity. I know they had their reasons in the past to be oh so clever, trade secrets, etc, etc, but now it's time to move on from that nonsense. None of this clear as mud bullshit.

Edited by TheSaint

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

@TheSaint

I totally agree. You should capitalize however you want. My function even caters for you: You can pick mode 1 and capitalize every word. Or if someday you need the function to approximate the Formal Rules for Title Capitalization, you can choose mode 2.

My only concern as the person who started this thread was not leaving as the final word your view that "true titles" had "each and every word" capitalised. That's a valid viewpoint, but 'technically' or 'formally' incorrect.

BTW, I tried to do some research on how title capitalization got to be the way it is in English (including Australian English, incidentally—I did learn that much). But I had no success, other than that is the product of evolution and fashion which varied a lot and was not formalized until the appearance of grammar references and dictionaries in the 19th century.

Link to comment
Share on other sites

I like the traditional system because it allows the more important words to stand out in a title. The only reason I see for not liking this system is that the rules appear to be ambiguous and therefore hard to code for.

After adding the (optional) common British English exceptions I listed previously, I'm now thinking of adding another option - common international name exceptions. I have come up with the following list:

af, av, da, de, de la, de los, del, di, van de, van der, von

I think this should cover most, although not all, scenarios, eg:

Leonardo da Vinci. Wernher von Braun, Paco de Lucía

I think many of these non English words or word combinations are generally written as lower case in titles regardless of whether they form part of a name or not. Unfortunately the above list will not cater for Vincent van Gogh because 'van' is also an English word. Are there any more I'm missing?

Edited by czardas
Link to comment
Share on other sites

_StringChooseCase has now been

UPDATE (9 Jan 2013): The code is a hair more efficient. #include-once and #include <array.au3> now appear where they're supposed to. "I" is now always capitalized (both as the first person pronoun and the Roman numeral one). Title Case further improved: It now has a more comprehensive list of lower-case words--mainly more prepositions--and the last word of a title will always be capitalized.

@czardas

Just as I've inspired you, you've inspired me. I'm thinking about adding a parameter which would permit users to tell the function to ignore any word (or sentence, when using sentence-case option) that has mixed cases within it. So "apple," "street," and "boot" would all be capitalized, but "House," "iPod," and "VanderVeldon" "SCRAM!" would all be left exactly as input (edit: SCRAM wouldn't be left alone, because it's actually all upper-case). This seems like a useful addition... but is it really? Isn't such a function mostly used for text that is ALL CAPS, or all lower case, or entirely messed up in some other way? How often is it necessary to change capitalization where some text is correctly capitalized and some isn't?

Edited by tcurran
Link to comment
Share on other sites

I think the best option for this is to ignore all words which contain capitals and only capitalize lower case. This leaves upper case Roman numerals and abbreviations intact. You don't want BBC to become Bbc, unless you actually do want proper case. This is the way I approached it.

I just found a pretty good link for Associated Press lower title case exceptions => Grammar Girl

Edited by czardas
Link to comment
Share on other sites

What do you do when the word has been incorrectly written in all caps? Or a word that should be all caps is written in all lower case? Good luck figuring out the edge cases, pun intended.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

It's impossible to fix all the user's mistakes, and a large database of abbreviations isn't a particularly watertight solution. If you plan on capitalizing a long list of titles, there will nearly always be some inconsistancies, whatever method you choose. If the method parses over 95% of international names correctly, that's got to be worth the effort. Also if you happen to know which abbreviations are present, you can simply run StringReplace afterwards.

Edit

The next post is very much to the point. If you want to fix bbC, then use tcurran's _StringChooseCase :)

Edited by czardas
Link to comment
Share on other sites

What do you do when the word has been incorrectly written in all caps? Or a word that should be all caps is written in all lower case? Good luck figuring out the edge cases, pun intended.

Whatever you do, you are always going to eventually come across a scenario where you get it wrong.

The all important thing though, is the regular scenario where you mostly get it right, and thus reduce the manual editing.

Most things like BBC are easy to code for and build into a database, and like tcurran said earlier and supports, specific user databases to match their preference ... like the Media one I would use for that type of material/program.

If necessary, you can also build in prompting, like I have in some of my programs, where a third 'pipe' parameter indicates whether the user should be prompted. As it's still far easier to click yes or no, rather than manual typing/editing. This would be particularly relevant in the MC scenario that czardas mentioned earlier, so that if it's in a Roman Numeral scenario, you would click YES to all uppercase, whereas if it's McDonalds, you would click NO, and get the Mc result.

I find with the prompting, that I get pretty close to 100% accuracy ... as close as you are ever gonna get anyway.

Edited by TheSaint

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

Here's an example from my Update Mp3 Artwork program, which I use at the click of a CASE button for the Track Title (ID3 Tag) field. In the Settings window, is another CASES button that allows editing the following text file in Notepad.

Cases.txt

; Specify any case changes or replacements you want here, with each item

; using a single line and the parts separated by a pipe '|' character.

; If you want to have a 'confirm the change' prompt, then add 'query'

; after another pipe (i.e. COMEDY|comedy|query). No blank lines.

; Remember that spaces can be important to prevent errors.

;

(Bonus Track)| (bonus)|query

Remix)| remix)

(Early Version)| (early version)

(Instrumental| (instrumental

(Bbc | (BBC

Session)| session)

(Retake)| (retake)

(Backing Track | (backing track

(Guitar & Vocal | (guitar & vocal

(Mono)| (mono)

(Hd |(HD

(Take |(take

(Fast |(fast

Live)| live)

CD| - CD

Take)| take)

Acoustic Take)|acoustic take)

(Early | (early |query

Single)| single)

(Single | (single

(Non-Album | (non-album

(Piano | (piano

(Queen With | (with

(With | (with

Recording)| recording)

(Second | (second |query

(Duet)| (duet)

, Etc|, etc|query

Rpm | rpm |query

_| |query

(Vocals | (vocals

Piano version)| piano version)

(Feat. | (feat.

Im | I'm

(Previously Unreleased|(unreleased bonus

(Version One)|(version 1)|query

(Version Two)|(version 2)|query

(Version Four)|(version 4)|query

(Version |(version

(Long version)|(long version)

Demo)| demo)

(Rare)| (rare)

Here's the simple code I use to process it, remembering that I code with speed not perfection ... for personal use only.

I do my programs like I needed them yesterday, which in most cases I do.

Func CaseFile($tagtxt)
    If FileExists($casefle) Then
        $lines = _FileCountLines($casefle)
        If $lines > 0 Then
            $file = FileOpen($casefle, 0)
           For $l = 1 To $lines
                $line = FileReadLine($file, $l)
                If StringLeft($line, 1) <> ";" Then
                    $part = StringSplit($line, "|")
                   If $part[0] > 1 Then
                        If StringInStr($tagtxt, $part[1]) > 0 Then
                            If $part[0] = 2 Then
                                $tagtxt = StringReplace($tagtxt, $part[1], $part[2])
                            ElseIf $part[3] = "query" Then
                                $ans = MsgBox(262177, "Confirm Case Change or Replacement", _
                                    "Change --> " & $part[1] & @LF & _
                                    "To --> " & $part[2] & @LF & @LF & _
                                    "NOTE - This will apply to every instance." & @LF & @LF & _
                                    "Do you want to proceed?", 0)
                                If $ans = 1 Then
                                    $tagtxt = StringReplace($tagtxt, $part[1], $part[2])
                                EndIf
                            EndIf
                         EndIf
                     EndIf
                 EndIf
             Next
             FileClose($file)
        EndIf
        Return $tagtxt
    EndIf
EndFunc ;=> CaseFile

Very simplistic for sure, but does the job well, and fast enough processing for my needs.

NOTE - Because I manually transferred the code from my other PC via a text file, all the tabbing got lost when I pasted here, so I've manually & painfully spaced everything. Which appears to gradually get lost after every edit I do.

Edited by TheSaint

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

  • 1 year later...

Working with AutoIt for about four months now (needed an automation tool for a project - found this, loved it, been using it for lots of things since) - first post on the forum (though I've been using it a lot for reference material!)

Decided to port over a set of tools I've developed in .php after multiple 'security' methods failed to work on localhosted machines (I know, there are no ways to 'secure' anything from anyone that wants it bad enough......) - AutoIt so far has proven a good fit (at least it puts locks on all-glass windows).

Anyway, as part of the tools, I had to develop some language rules similar to this UDF, and, of course, ran into all the same things being discussed here.

What I decided to do was provide the user with various exception files (asking a user to make changes in the code is, IMHO, 'wrong' on many levels) that allows them to make their own choice on just how complex the language rulings will be (and, making sure to give them disclaimers on how a long list will affect processing time, etc.).  

Of course, it gets a bit cumbersome for the user as well as coding interfaces (AutoIt certainly is simpler for that) to edit the files, but once they set it up, changes are very rare.  I also put in some defaults to cover as many things as I can (roman numerals being one).

What it boiled down to is somewhat similar to this UDF (which works great for what it does, btw), though adds 'exceptions to the exceptions' files as well by having an "ALWAYS CAPS" (in UDF), "never caps" (not in UDF) as well as "Mixed Caps Words" (much like the exceptions in this UDF, however, the UDF is much better than my solution), and also an exception-to-the-exception file for "Exception to Caps Words".  

The 'never caps' and the last file ("Exception to Caps Words") is what I feel is missing in this UDF.

Why?  Well, for the 'never caps', it allows the UDF to work with more than English languages.  While nothing will cover everything, having this option gives the user some additional control over their local rules.  In Spanish, for instance (I'm no expert but I live in Ecuador right now and so these things come up and I asked the Administrator of the school where I'm teaching English about the rules), 'de', 'la', 'y' and 'el' should NEVER be caps (though acceptable when the first word in a sentence or in an Every Word situation, they should not be caps).  None of these interfere with English rules, so easy enough to have them as defaults.  English has no 'standard' words/names/titles like this, though some Authors, Artists and even product names choose such a preference, so having this option (with a user-controlled list) makes the UDF more powerful.

Exception to Caps Words - when I was developing my .php tool in this area, I got a heavy slap-in-the-face reminder from my sister (who does a lot of testing/documentation for me) about our typing teacher in school (we had the same one, though my sister is 16 years my senior) and the capitalization of "Mac" words.  Yes, there are a lot of them, though enough 'exceptions' to that 'exception' rule to merit attention.

The one that really hit me was when she reminded me of this 'test' from our typing teacher.  Spell out  M C D O N A L D  orally and ask someone what it spells.   Then  M C H E N R Y.   Then  M A C H I N E R Y.  (no, the answer is NOT MacHinery!)

Now, run this through the UDF........

Mack Macaw & The Machismo Maculating Macaroni Macaroon Machine

(ok, that is not a highly popular book, song nor group, but you get the idea... - the UDF needs an 'exception-to-the-exception' rule!)

Being new to AutoIt and just getting a reasonable start on RegEx (though coding in .php for nearly 20 years - yeah, I worked around RegEx 'long hand' - though now I am starting to use it and like it...), I'm a bit lost on how to recommend coding changes to the UDF, so, for now, in my porting over of my tool, I'm going to use this UDF and do a blanket ignore on strings that contain the exception-to-the-exceptions and give the user a report on those found during the search (into the standard exceptions file/report that we pop out).  As I get better with understanding the various AutoIt commands used in this UDF (or the author sees fit to take these suggestions to heart....! ;) I'll offer more direct changes.

Thanks to the authors of AutoIt and all the UDFs - I'm learning a lot about GUI programs and have found a new 'world' to play in!

Link to comment
Share on other sites

OK, after hours of 'messing around', I have what I think is an actual contribution to this UDF, not simply a comment (it is nicer to actually offer suggestions [and working code!] that really mean something!).
 
To that end, I offer this;
 
1.
There is a BUG with the Sample code 
ConsoleWrite(_StringChooseCase('"and the band played on"', 2) & @CRLF)

Gives an Error "ERROR: _StringChooseCase() called with Const or expression on ByRef-param(s)."

 
Suggest replacing it with 
$test = '"and the band played on"'
ConsoleWrite(_StringChooseCase($test, 2) & @CRLF)
 
2. Below is my modified version of the UDF, with original code in place (commented out and new items commented to show what is going on)
 
NEW FEATURES IN THIS VERSION: 
  1. Avoids anomalies with improper capitalization of words conflicting with the ^ exception rules (MacHinery, MacK, O'Clock, etc.)
  2. Support for non-English exception rules (a few Spanish and French words tested [a basic list to show it is possible])
  3. Allows for multiple exception rule matches to be treated within the same sentence (option 3)
  4. Implements an 'exception-to-exceptions' option that provides more flexibility in creating complex rules (for other languages and/or suitability to various purposes)
  5. Adds support for @CRLF, @CR and @LF end-of-line characters (useful for various 'lists' that are not actually sentences - perhaps loaded from files, etc.)
HOW TO TEST THE DIFFERENCES
Use this same $test string for both versions
Global $test = "it is now " & @HOUR & " o'clock" & @CRLF & "In The USA Mack McHenry watches BBC on his iPhone y el iPad and macintosh machine on the internet.  The quick brown fox JUMPED over the lazy MacDonalds and the MacDougals visited Pont l'Évêque at 3 o'clock with the O'Reilly and O'Conners. The USA's Usain Bolt ran for the USA."
 

IN THE CURRENT UDF use this exceptions rule (same as current UDF)

ConsoleWrite(_StringChooseCase($test, 3, "Mc^|Mac^|O'^|USA|FBI|Barack|Obama") & @CRLF)
RESULT (Current UDF - Option 1)
It Is Now 18 O'Clock
In The USA MacK McHenry Watches Bbc On His Iphone Y El Ipad And MacIntosh MacHine On The Internet.  The Quick Brown Fox Jumped Over The Lazy MacDonalds And The MacDougals Visited Pont L'évêque At 3 O'Clock With The O'Reilly And O'Conners. The USA's Usain Bolt Ran For The USA.
 
RESULT (Current UDF - Option 3) (note missing time reference)
In the USA MacK McHenry watches bbc on his iphone y el ipad and Macintosh Machine on the internet.  The quick brown fox jumped over the lazy MacDonalds and the Macdougals visited pont l'évêque at 3 O'Clock with the O'reilly and O'conners. The USA's usain bolt ran for the USA.
 
 
IN MY VERSION (below) use this exceptions rule
ConsoleWrite(_StringChooseCase($test, 3, "Mc^|Mac^|O'^|l'^|FBI|Barack|Obama|USA|BBC|I|II|III|IV|V|VI|VII|VIII|IX|X|de|la|el|y|eBook|iPhone|iPad|Internet|!|o'clock|machine|machismo|mack|machismo|maculating|macaroni|macaroon") & @CRLF)

(Note the |!| delimiter between the 'exceptions' and 'exception-to-exceptions' rules)

The 'exception-to-exceptions' rules will honor the Options (meaning that they only are exceptions to the ^ forced caps type exceptions and will follow the options properly - to force them to 'ALL CAPS', 'no caps' or 'aSiTyPeDthEm', put them on the front side of the rule as before)

RESULT (my version - Option 1)
It Is Now 18 O'clock
In The USA Mack McHenry Watches BBC On His iPhone y el iPad And MacIntosh Machine On The Internet.  The Quick Brown Fox Jumped Over The Lazy MacDonalds And The MacDougals Visited Pont l'ÉVêque At 3 O'clock With The O'Reilly And O'Conners. The USA's Usain Bolt Ran For The USA.
 
RESULT (my version - Option 3)
It is now 18 o'clock
In the USA mack McHenry watches BBC on his iPhone y el iPad and MacIntosh machine on the Internet.  The quick brown fox jumped over the lazy MacDonalds and the MacDougals visited pont l'ÉVêque at 3 o'clock with the O'Reilly and O'Conners. The USA's usain bolt ran for the USA.
 
#include-once
#include <Array.au3> ;_ArrayToString UDF used in Return

; #FUNCTION# ====================================================================================================================
; Name...........: _StringChooseCase
; Description ...: Returns a string in the selected upper & lower case format: Initial Caps, Title Case, or Sentence Case
; Syntax.........: _StringChooseCase($sMixed, $iOption[, $sCapExcepts = "Mc^|Mac^|O'^|II|III|IV"])
;PROSPECTIVE: add param for Ignore mixed case input
; Parameters ....: $sMixed -           String to change capitalization of.
;                 $iOption -          1: Initial Caps: Capitalize Every Word;
;                                    2: Title Case: Use Standard Rules for the Capitalization of Work Titles;
;                                    3: Sentence Case: Capitalize as in a sentence.
;                 $sCapExcepts    -  [optional] Exceptions to capitalizing set by options, delimited by | character. Use the ^
;                                    character to cause the next input character (whatever it is) to be capitalized
; Return values .: Success - Returns the same string, capitalized as selected.
;                 Failure - ""
; Author ........: Tim Curran <tim at timcurran dot com> - modified by TechCoder (added various features including exception-to-exception option)
; Remarks .......: Option 1 is similar to standard UDF _StringProper, but avoids anomalies like capital following an apostrophe
; Related .......: _StringProper, StringUpper, StringLower
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================


; EXAMPLE
; Global $test = "it is now " & @HOUR & " o'clock" & @CRLF & "In The USA Mack McHenry watches BBC on his iPhone y el iPad and macintosh machine on the internet.  The quick brown fox JUMPED over the lazy MacDonalds and the MacDougals visited Pont l'Évêque at 3 o'clock with the O'Reilly and O'Conners. The USA's Usain Bolt ran for the USA."
; ConsoleWrite(_StringChooseCase($test, 3, "Mc^|Mac^|O'^|l'^|FBI|Barack|Obama|USA|BBC|I|II|III|IV|V|VI|VII|VIII|IX|X|de|la|el|y|eBook|iPhone|iPad|Internet|!|o'clock|machine|machismo|mack|machismo|maculating|macaroni|macaroon") & @CRLF)



Func _StringChooseCase(ByRef $sMixed, $iOption, $sCapExcepts = "Mc^|Mac^|O'^|I|II|III|IV")
    Local $asSegments, $sTrimtoAlpha, $iCapPos = 1
    $sMixed = StringLower($sMixed)
    Switch $iOption
        Case 1 ;Initial Caps
            $asSegments = StringRegExp($sMixed, ".*?(?:\s|\Z)", 3) ;break by word
        Case 2 ;Title Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\s|\Z)", 3) ;break by word
        Case 3 ;Sentence Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\.\W*|\?\W*|\!\W*|\:\W*|[\r\n|\r|\n]|\Z)", 3) ;break by sentence
    EndSwitch
    Local $iLastWord = UBound($asSegments) - 2
    For $iIndex = 0 To $iLastWord ;Capitalize the first letter of each element in array
        $sTrimtoAlpha = StringRegExp($asSegments[$iIndex], "\w.*", 1)
        If @error = 0 Then $iCapPos = StringInStr($asSegments[$iIndex], $sTrimtoAlpha[0])
        If $iOption <> 2 Or $iIndex = 0 Then ;Follow non-cap rules for Title Case if option selected (including cap last word)
            $asSegments[$iIndex] = StringReplace($asSegments[$iIndex], $iCapPos, StringUpper(StringMid($asSegments[$iIndex], $iCapPos, 1)))
        ElseIf $iIndex = $iLastWord Or StringRegExp($asSegments[$iIndex], "\band\b|\bthe\b|\ba\b|\ban\b|\bbut\b|\bfor\b|\bor\b|\bin\b|\bon\b|\bfrom\b|\bto\b|\bby\b|\bover\b|\bof\b|\bto\b|\bwith\b|\bas\b|\bat\b", 0) = 0 Then
            $asSegments[$iIndex] = StringReplace($asSegments[$iIndex], $iCapPos, StringUpper(StringMid($asSegments[$iIndex], $iCapPos, 1)))
        EndIf
        ;Capitalization exceptions
        $asSegments[$iIndex] = _CapExcept($asSegments[$iIndex], $sCapExcepts)
    Next
    Return _ArrayToString($asSegments, "")
EndFunc   ;==>_StringChooseCase

Func _CapExcept($sSource, $sExceptions)
    Local $sRegExaExcept, $iMakeUCPos
    ; ******* ORIGINAL LINE
    ;       Local $avExcept = StringSplit($sExceptions, "|")
    ; ********** REPLACED BY THE FOLLOWING LINES ************
    If StringInStr($sExceptions, "|!|") Then
        $sExceptions = StringSplit($sExceptions, "|!|", 1) ; split the 'exceptions' list to make [1] as the 'rules' and [2] as the 'exceptions-to-exceptions rules'
        Local $avExcept = StringSplit($sExceptions[1], "|") ; keeps the same array as original code
        Local $avExceptExcept = StringSplit($sExceptions[2], "|") ; adds an array for 'exception-to-exception'
    Else
        Local $avExcept = StringSplit($sExceptions, "|")
        Local $avExceptExcept[1]
    EndIf

    For $iIndex = 1 To $avExcept[0]
        $sRegExaExcept = "(?i)\b" & $avExcept[$iIndex]
        $iMakeUCPos = StringInStr($avExcept[$iIndex], "^")
        If $iMakeUCPos <> 0 Then
            $sRegExaExcept = StringReplace($sRegExaExcept, "^", "")
        Else
            $sRegExaExcept &= "\b"
        EndIf
        $avExcept[$iIndex] = StringReplace($avExcept[$iIndex], "^", "") ;remove ^ from replacement text

        ; **************************************************************************
        ; MOVE this line down (doesn't matter where it is done, so we move it to allow the exception-to-exception check)
        ; $sSource = StringRegExpReplace($sSource, $sRegExaExcept, $avExcept[$iIndex])
        ; **************************************************************************

        If $iMakeUCPos <> 0 Then
            Local $iNextUC = _StringRegExpPos($sSource, $sRegExaExcept)
            Local $iMatches = @extended
            ; *******************  next line moved into the loop
            ;   Local $iCapThis = $iNextUC + $iMakeUCPos

            For $x = 1 To $iMatches
                ; ***********  ADDING THE NEXT TWO LINES TAKES CARE OF MULTIPLE INSTANCES OF MC^, ETC. ****************
                $iNextUC = _StringRegExpPos($sSource, $sRegExaExcept, $x)
                Local $iCapThis = $iNextUC + $iMakeUCPos

                ; *********** 'LOOK AHEAD' TO SEE THE WORD AND CHECK THE EXCEPTION-TO-EXCEPTIONS LIST *********
                Local $dont_change = False ; presume we want to change things
                For $y = 1 To $avExceptExcept[0]
                    If StringInStr($sSource, $avExceptExcept[$y]) = $iNextUC Then
                        ; found the exception-to-exception string in the same location as iNextUC - SKIP THIS ONE
                        $dont_change = True
                    EndIf
                Next
                If $dont_change <> True Then
                    ; ******************  this line changes slightly to reinsert the exception (as typed by the user)
                    ; $sSource = StringLeft($sSource, $iCapThis - 2) & StringUpper(StringMid($sSource, $iCapThis - 1, 1)) & StringMid($sSource, $iCapThis)
                    $sSource = StringLeft($sSource, $iCapThis - 1 - $iMakeUCPos) & $avExcept[$iIndex] & StringUpper(StringMid($sSource, $iCapThis - 1, 1)) & StringMid($sSource, $iCapThis)
                EndIf
            Next
        Else
            ; replace the StringRegExpReplace
            $sSource = StringRegExpReplace($sSource, $sRegExaExcept, $avExcept[$iIndex])
        EndIf
    Next
    Return $sSource
EndFunc   ;==>_CapExcept

Func _StringRegExpPos($sTest, $sPattern, $iOcc = 1, $iStart = 1)
    Local $sDelim, $iHits
    If $iStart > StringLen($sTest) Then Return SetError(1)
    ;Delimiter creation snippet by dany from his version of _StringRegExpSplit
    For $i = 1 To 31
        $sDelim &= Chr($i)
        If Not StringInStr($sTest, $sDelim) Then ExitLoop
        If 32 = StringLen($sDelim) Then Return SetError(3, 0, 0)
    Next
    Local $aResults = StringRegExpReplace(StringMid($sTest, $iStart + (StringLen($sDelim) * ($iOcc - 1))), "(" & $sPattern & ")", $sDelim & "$1")
    If @error = 2 Then Return SetError(2, @extended, 0)
    $iHits = @extended

    If $iHits = 0 Then Return 0
    If $iOcc > $iHits Then Return SetError(1)
    Local $iPos = StringInStr($aResults, $sDelim, 0, $iOcc)
    SetExtended($iHits)
    Return $iStart - 1 + $iPos
EndFunc   ;==>_StringRegExpPos
 

Of course, the mods I did aren't the 'do-all-end-all' either (never will be one of those!), though this seems to suit the purpose I needed for now and I hope the result helps someone else.

 
Comments/suggestions on how to do this better are most welcome.  I'm very new to AutoIt and RegEx, so likely there are better ways, so anything you see, please let me know.
Link to comment
Share on other sites

Working with certain lists (in this case, a Title/Artist list), you may want to change the rules a bit;

Case 1 ;Initial Caps
            $asSegments = StringRegExp($sMixed, ".*?(?:\-|\.|\s|\Z)", 3) ;break by word
        Case 2 ;Title Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\-|\.|\s|\Z)", 3) ;break by word
        Case 3 ;Sentence Case
            $asSegments = StringRegExp($sMixed, ".*?(?:\-|\.|\.\W*|\?\W*|\!\W*|\:\W*|[\r\n|\r|\n]|\Z)", 3) ;break by sentence

This allows you to correctly capitalize names with periods and no spaces and those with dashes between letters like:  

B.B King, Y.M.C.A. and D-I-V-O-R-C-E

I can't think of any time that these additions would be a problem and I plan to leave them in my rules for now and test some more.

I first tried to do the 'simple' way and just add to the exceptions rules, which would give nice flexibility, though when I tried to add "-^|.^|" in the exception rules and call to the UDF, it came back with some very unexpected results (try it for yourself - kinda fun to see the output!)

Link to comment
Share on other sites

Here's a GUI that makes things a bit easier to test (works well with the version I posted earlier - untested with the original posted by tcurran).
 
#include <GUIConstantsEx.au3>
#include "_StringChooseCase.au3"
Local $guiwidth = @DesktopWidth * .35
Local $guiheight = @DesktopHeight * .7
Local $margin = 20
Local $verticalspace = 5, $horizontalspace = 10
Local $horizontal = 20, $vertical = $margin
GUICreate("Test Capitalization Rules Using _StringChooseCase", $guiwidth, $guiheight)
GUICtrlCreateGroup("Words to output AS TYPED (single line or separate with '|')", $horizontal, $vertical, $guiwidth - ($margin * 2), $guiheight * .3)
Local $wysiwyg = GUICtrlCreateEdit("BBC|ABC|NBC|CBS|MTV|VH1"&@CRLF&"DVD|CD|USB"&@CRLF&"AC-DC|REM|REO|ZZ|SHeDAISY"&@CRLF&"FBI|CIA|NAACP|AARP" & _
@CRLF & "AL|AK|AZ|AR|CA|CO|CT|DE|FL|GA|HI|ID|IL|IN|IA|KS|KY|LA|ME|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VT|VA|WA|WV|WI|WY"& _
@CRLF&"Barack|Obama" & @CRLF & "USA|BBC|UK|MC" & @CRLF & "FedEx|UPS|USPS" & @CRLF & "I|II|III|IV|V|VI|VII|VIII|IX|X" & @CRLF & "de|la|el|y" & @CRLF & _
"eBook" & @CRLF & "iPhone|iPad|Internet", $horizontal + 20, $vertical + 20, $guiwidth - ($margin * 4), $guiheight * .3 - ($margin * 2))
$vertical += $verticalspace + $guiheight * .3
Local $box_size = ($guiwidth * .5) - ($horizontalspace / 2) - $margin
$horizontal = $margin
GUICtrlCreateGroup("Capitalization Rules (^ for CAP letter)", $horizontal, $vertical, $box_size, $guiheight * .3)
Local $capsrules = GUICtrlCreateEdit("Mc^|Mac^" & @CRLF & "O'^" & @CRLF & "l'^", $horizontal + 20, $vertical + 20, $box_size - ($margin * 2), $guiheight * .3 - ($margin * 2))
$horizontal += $horizontalspace + $box_size
GUICtrlCreateGroup("Exceptions to Capitalization Rules", $horizontal, $vertical, $box_size, $guiheight * .3)
Local $exceptions = GUICtrlCreateEdit("machine|machismo|mack|machismo|maculating|macaroni|macaroon" & @CRLF & "o'clock", $horizontal + 20, $vertical + 20, $box_size - ($margin * 2), $guiheight * .3 - ($margin * 2))
$vertical += $guiheight * .3 + $verticalspace * 2
$horizontal = $margin
Local $test = GUICtrlCreateInput("Sentence To Test", $horizontal, $vertical, $guiwidth - ($margin * 2), 30)
$vertical += 30 + $verticalspace
Local $radio1 = GUICtrlCreateRadio("1 Capitalize Every Word", $horizontal, $vertical, $guiwidth / 3 - $horizontalspace, 30)
$horizontal += $guiwidth / 3
Local $radio2 = GUICtrlCreateRadio("2 Work Title Rules", $horizontal, $vertical, $guiwidth / 3 - $horizontalspace, 30)
$horizontal += $guiwidth / 3
Local $radio3 = GUICtrlCreateRadio("3 Sentence Case", $horizontal, $vertical, $guiwidth / 3 - $horizontalspace, 30)
GUICtrlSetState($radio3, $GUI_CHECKED)
$vertical += 30
$horizontal = $margin
Local $output = GUICtrlCreateEdit("OUTPUT - Set rules and press 'Process' to update", $horizontal, $vertical, $guiwidth - $margin * 2, 50)
$vertical += 50 + $verticalspace
$horizontal = $margin
Local $process = GUICtrlCreateButton("Process", $margin, $vertical, $guiwidth - $margin * 2, 30)
GUICtrlSetState($test, $GUI_FOCUS)
GUISetState()
While 1
    $msg = GUIGetMsg()
    Select
        Case $msg = $GUI_EVENT_CLOSE
            ExitLoop
        Case $msg = $process
            Local $test_string = GUICtrlRead($test)
            Local $result = _StringChooseCase($test_string, _Option(), StringReplace(GUICtrlRead($capsrules) & "|" & GUICtrlRead($wysiwyg) & "|!|" & GUICtrlRead($exceptions), @CRLF, "|"))
            ConsoleWrite($result & @CRLF)
            GUICtrlSetData($output, $result)
    EndSelect
WEnd
Func _Option()
    Switch $GUI_CHECKED
        Case GUICtrlRead($radio1)
            Return 1
        Case GUICtrlRead($radio2)
            Return 2
        Case GUICtrlRead($radio3)
            Return 3
    EndSwitch
EndFunc   ;==>_Option
I put in the 'defaults' as (most of) those discussed in the previous posts including various additional ones I know/thought of (50 US state abreviations, singers with unusual caps, etc).  Keeping these rules in a single row makes it easy to keep up with them as they are (somewhat) 'categorized', though there is no checking one against the other, etc.
 
You can have 'doubles' in the lists, though it creates some odd results.......
 
Test this sentence - it works.......
mcintire contains mc, though mc is also roman numeral 1100
 
However, mix it around and it doesn't.......
mc is part of mcintire, though mc is also roman numeral 1100
 
(this is due to the search pattern of going through the rules from the top/down instead of as an array and pattern matching, etc. - if it is a serious issue for you, it is possible to change that, but is a bit of work!)
 
Another 'anomaly' (though easily corrected for most...) is the very rare situation of using the given rules and searching this particular song title
Y.M.C.A. will result in y.M.C.A. (due to the Spanish rules in the code with 'y' being an "always lower case" letter - remove that line if you are using only English words/phrases and avoid this particular issue)
 
You can change the rules as you like in the code, or with a bit of mod, read a file/database into the control (I will make that mod for my use and also the 'Proceed' button will then save the data back to the file/database).
 
This tool makes it much simpler for users to 'create your own' rules and/or test various things against your mods to the UDF.
Link to comment
Share on other sites

  • 2 years later...

@tcurran - Thanks for writing this UDF; I hope you are still monitoring this thread and are up for a short revival of it

I've found the UDF very useful in correctly capitalizing titles from a music database.  Since I have no control over what goes into the database, I can't simply ensure it is done correctly when input.  In trying to correct a couple of strange yet common conditions, I ended up writing external processing because I wasn't able to pass exceptions for them to _StringChooseCase.  Would you please provide some help?

Example Input:  "band, the - a song title"

Resulting Output: "Band, the - a Song Title"

Desired Output: "Band, The - A Song Title"

From what I can tell, the exceptions I tried did not work because they would need to contain whitespace.  Any suggestions?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...