VIP

[SOLVED] Need help _StringTitleCase() _StringProper() not working on Vietnamese

12 posts in this topic

#1 ·  Posted (edited)

_StringTitleCase() and _StringProper() not support Vietnamese ?

Global Const $Vietnamese_stringlower_List = "á|à|ả|ã|ạ|â|ấ|ầ|ẩ|ẫ|ậ|ă|ắ|ằ|ẳ|ẵ|ặ|đ|ê|é|è|ẻ|ẽ|ẹ|ế|ề|ể|ễ|ệ|ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự|ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ|í|ì|ỉ|ĩ|ị|ý|ỳ|ỷ|ỹ|ỵ"
Global Const $Vietnamese_STRINGUPPER_List = "Á|À|Ả|Ã|Ạ|Â|Ấ|Ầ|Ẩ|Ẫ|Ậ|Ă|Ắ|Ằ|Ẳ|Ẵ|Ặ|Đ|Ê|É|È|Ẻ|Ẽ|Ẹ|Ế|Ề|Ể|Ễ|Ệ|Ú|Ù|Ủ|Ũ|Ụ|Ư|Ứ|Ừ|Ử|Ữ|Ự|Ó|Ò|Ỏ|Õ|Ọ|Ô|Ố|Ồ|Ổ|Ỗ|Ộ|Ơ|Ớ|Ờ|Ở|Ỡ|Ợ|Í|Ì|Ỉ|Ĩ|Ị|Ý|Ỳ|Ỷ|Ỹ|Ỵ"
Global Const $Vietnamese_to_ASCII[14][2] = [["a" = "á|à|ả|ã|ạ|â|ấ|ầ|ẩ|ẫ|ậ|ă|ắ|ằ|ẳ|ẵ|ặ"], ["A" = "Á|À|Ả|Ã|Ạ|Â|Ấ|Ầ|Ẩ|Ẫ|Ậ|Ă|Ắ|Ằ|Ẳ|Ẵ|Ặ"], ["d" = "đ"], ["D" = "Đ"], ["e" = "ê|é|è|ẻ|ẽ|ẹ|ế|ề|ể|ễ|ệ"], ["E" = "Ê|É|È|Ẻ|Ẽ|Ẹ|Ế|Ề|Ể|Ễ|Ệ"], ["u" = "ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự"], ["U" = "Ú|Ù|Ủ|Ũ|Ụ|Ư|Ứ|Ừ|Ử|Ữ|Ự"], ["o" = "ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ"], ["O" = "Ó|Ò|Ỏ|Õ|Ọ|Ô|Ố|Ồ|Ổ|Ỗ|Ộ|Ơ|Ớ|Ờ|Ở|Ỡ|Ợ"], ["i" = "í|ì|ỉ|ĩ|ị"], ["I" = "Í|Ì|Ỉ|Ĩ|Ị"], ["y" = "ý|ỳ|ỷ|ỹ|ỵ"], ["Y" = "Ý|Ỳ|Ỷ|Ỹ|Ỵ"]]

Global Const $Vietnamese_Capitalize_Text = "Đây Là Dòng Chữ Tiếng_Việt Chuẩn"
Global Const $Vietnamese_stringlower_text = "đây là dòng chữ tiếng_việt chuẩn"
Global Const $Vietnamese_STRINGUPPER_text = "ĐÂY LÀ DÒNG CHỮ TIẾNG_VIỆT CHUẨN"

ConsoleWrite("- "  & (StringLower($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;True -> OK
ConsoleWrite("- "  & (StringLower($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ; True -> OK

ConsoleWrite("- "  & (StringUpper($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;True -> OK;
ConsoleWrite("- "  & (StringUpper($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ; True -> OK

ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_stringlower_text) == $Vietnamese_Capitalize_Text) & @CRLF) ; Not
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_STRINGUPPER_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese

ConsoleWrite("- "  & (_StringProper($Vietnamese_stringlower_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_STRINGUPPER_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese

 

Edited by Trong
[SOLVED]

Regards,
 

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I have very little time to look at this in any detail right now, However here is a quick modification to _StringProper().

MsgBox(0, "", _StringProper2("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringProper2($sString)
    Local $bCapNext = True, $sChr = "", $sReturn = ""
    For $i = 1 To StringLen($sString)
        $sChr = StringMid($sString, $i, 1)
        Select
            Case $bCapNext = True
                If StringRegExp($sChr, '(*UCP)[\w]') Then
                    $sChr = StringUpper($sChr)
                    $bCapNext = False
                EndIf
            Case Not StringRegExp($sChr, '(*UCP)[\w]')
                $bCapNext = True
            Case Else
                $sChr = StringLower($sChr)
        EndSelect
        $sReturn &= $sChr
    Next
    Return $sReturn
EndFunc   ;==>_StringProper

You, or someone else, might be able to improve on this. Look at the change in the RegExp from the original function. You might also want to capitalize words which are joined with underscore.

Func _StringProper2($sString)
    Local $bCapNext = True, $sChr = "", $sReturn = ""
    For $i = 1 To StringLen($sString)
        $sChr = StringMid($sString, $i, 1)
        Select
            Case $bCapNext = True
                If StringRegExp($sChr, '(*UCP)[\w]') Then
                    $sChr = StringUpper($sChr)
                    $bCapNext = False
                EndIf
            Case Not StringRegExp($sChr, '(*UCP)[\w]') Or StringRegExp($sChr, "[_0-9]") ; now underscore/numbers also trigger capitalization
                $bCapNext = True
            Case Else
                $sChr = StringLower($sChr)
        EndSelect
        $sReturn &= $sChr
    Next
    Return $sReturn
EndFunc   ;==>_StringProper

_StringTitleCase() will also need looking at. I don't have time today.

Edit : Modified 2nd version.

Edited by czardas
(?i) removed from regexp - not needed
1 person likes this

Share this post


Link to post
Share on other sites

#3 ·  Posted

Thanks for your help and suggestions! 


Regards,
 

Share this post


Link to post
Share on other sites

#4 ·  Posted

I'm happy if it helps you. I'm not exactly sure what the difference is between  _StringProper()  and _StringTitleCase().

1 person likes this

Share this post


Link to post
Share on other sites

#5 ·  Posted

These bugs have escaped my attention. The cure is good enough but can be streamlined:

MsgBox(0, "", _StringProper3("đây là dòng chữ tiếng_việt chuẩn"))
MsgBox(0, "", _StringTitleCase3("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringProper3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)\b(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

Func _StringTitleCase3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)(?<=\PL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

 

2 people like this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#6 ·  Posted

Thanks, I figured you could improve it and I'm glad you did. :)

Share this post


Link to post
Share on other sites

#7 ·  Posted

This has been already discussed in Trac, for instance: https://www.autoitscript.com/trac/autoit/ticket/2914

1 person likes this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

@jchd I just took a look at this. You have created a kind of 'camel' title case: I don't think that was quite your intention. The first word is not being capitalized. I struggled to find a good regular expression myself. I was expecting the output to be like this (patch seems to be working).

MsgBox(0, "", _StringTitleCase3("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringTitleCase3($s)
    Return StringTrimLeft(Execute("'" & StringRegExpReplace(" " & StringLower($s), "(*UCP)(?<=\PL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"), 1)
EndFunc

 

Edited by czardas
1 person likes this

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Oops, sorry I didn't have enough time to check carefully before leaving. The negation was malplaced:

Func _StringTitleCase3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)(?<!\pL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

Titlecase differs from uppercasing the first "letter" in several instances, like when applied to some digrams. As an example, at least those character fall into his bucket:
 

Codepoint    Character    Upper    Lower    Fold
u00001C5         Dž          DŽ        dž        dž
u00001C8         Lj          LJ        lj        lj
u00001CB         Nj          NJ        nj        nj
u00001F2         Dz          DZ        dz        dz

Unicode (here understand that by "human scripts") makes subtle differences between lowercase and foldcase, titlecase and propercase and titlecase. For most human scripts that doesn't make a difference but for some it matters.

EDIT: unfortunately, I don't recall Windows natively offering a primitive for Title or such, but maybe things have changed since I last looked.

Edited by jchd
2 people like this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

This may take some study. In English I don't think there is an official titlecase: the nearest definition appears to be AP titlecase (Associated Press). This falls under the category of writing style. How things are in other languages is a whole new universe (to me). :whistle:

Edited by czardas

Share this post


Link to post
Share on other sites

#11 ·  Posted


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#12 ·  Posted

Thanks. I have deadlines to meet, but I'll take a good look at it later in the week. :)

1 person likes this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now