Jump to content

[SOLVED] Need help _StringTitleCase() _StringProper() not working on Vietnamese


Recommended Posts

_StringTitleCase() and _StringProper() not support Vietnamese ?

Global Const $Vietnamese_stringlower_List = "á|à|ả|ã|ạ|â|ấ|ầ|ẩ|ẫ|ậ|ă|ắ|ằ|ẳ|ẵ|ặ|đ|ê|é|è|ẻ|ẽ|ẹ|ế|ề|ể|ễ|ệ|ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự|ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ|í|ì|ỉ|ĩ|ị|ý|ỳ|ỷ|ỹ|ỵ"
Global Const $Vietnamese_STRINGUPPER_List = "Á|À|Ả|Ã|Ạ|Â|Ấ|Ầ|Ẩ|Ẫ|Ậ|Ă|Ắ|Ằ|Ẳ|Ẵ|Ặ|Đ|Ê|É|È|Ẻ|Ẽ|Ẹ|Ế|Ề|Ể|Ễ|Ệ|Ú|Ù|Ủ|Ũ|Ụ|Ư|Ứ|Ừ|Ử|Ữ|Ự|Ó|Ò|Ỏ|Õ|Ọ|Ô|Ố|Ồ|Ổ|Ỗ|Ộ|Ơ|Ớ|Ờ|Ở|Ỡ|Ợ|Í|Ì|Ỉ|Ĩ|Ị|Ý|Ỳ|Ỷ|Ỹ|Ỵ"
Global Const $Vietnamese_to_ASCII[14][2] = [["a" = "á|à|ả|ã|ạ|â|ấ|ầ|ẩ|ẫ|ậ|ă|ắ|ằ|ẳ|ẵ|ặ"], ["A" = "Á|À|Ả|Ã|Ạ|Â|Ấ|Ầ|Ẩ|Ẫ|Ậ|Ă|Ắ|Ằ|Ẳ|Ẵ|Ặ"], ["d" = "đ"], ["D" = "Đ"], ["e" = "ê|é|è|ẻ|ẽ|ẹ|ế|ề|ể|ễ|ệ"], ["E" = "Ê|É|È|Ẻ|Ẽ|Ẹ|Ế|Ề|Ể|Ễ|Ệ"], ["u" = "ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự"], ["U" = "Ú|Ù|Ủ|Ũ|Ụ|Ư|Ứ|Ừ|Ử|Ữ|Ự"], ["o" = "ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ"], ["O" = "Ó|Ò|Ỏ|Õ|Ọ|Ô|Ố|Ồ|Ổ|Ỗ|Ộ|Ơ|Ớ|Ờ|Ở|Ỡ|Ợ"], ["i" = "í|ì|ỉ|ĩ|ị"], ["I" = "Í|Ì|Ỉ|Ĩ|Ị"], ["y" = "ý|ỳ|ỷ|ỹ|ỵ"], ["Y" = "Ý|Ỳ|Ỷ|Ỹ|Ỵ"]]

Global Const $Vietnamese_Capitalize_Text = "Đây Là Dòng Chữ Tiếng_Việt Chuẩn"
Global Const $Vietnamese_stringlower_text = "đây là dòng chữ tiếng_việt chuẩn"
Global Const $Vietnamese_STRINGUPPER_text = "ĐÂY LÀ DÒNG CHỮ TIẾNG_VIỆT CHUẨN"

ConsoleWrite("- "  & (StringLower($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;True -> OK
ConsoleWrite("- "  & (StringLower($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ; True -> OK

ConsoleWrite("- "  & (StringUpper($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;True -> OK;
ConsoleWrite("- "  & (StringUpper($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ; True -> OK

ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_stringlower_text) == $Vietnamese_Capitalize_Text) & @CRLF) ; Not
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_STRINGUPPER_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringTitleCase($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese

ConsoleWrite("- "  & (_StringProper($Vietnamese_stringlower_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_STRINGUPPER_text) == $Vietnamese_Capitalize_Text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_stringlower_text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_Capitalize_Text) == $Vietnamese_STRINGUPPER_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_Capitalize_Text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese
ConsoleWrite("- "  & (_StringProper($Vietnamese_STRINGUPPER_text) == $Vietnamese_stringlower_text) & @CRLF) ;False -> Not workign in Vietnamese

 

Edited by Trong
[SOLVED]

Regards,
 

Link to comment
Share on other sites

I have very little time to look at this in any detail right now, However here is a quick modification to _StringProper().

MsgBox(0, "", _StringProper2("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringProper2($sString)
    Local $bCapNext = True, $sChr = "", $sReturn = ""
    For $i = 1 To StringLen($sString)
        $sChr = StringMid($sString, $i, 1)
        Select
            Case $bCapNext = True
                If StringRegExp($sChr, '(*UCP)[\w]') Then
                    $sChr = StringUpper($sChr)
                    $bCapNext = False
                EndIf
            Case Not StringRegExp($sChr, '(*UCP)[\w]')
                $bCapNext = True
            Case Else
                $sChr = StringLower($sChr)
        EndSelect
        $sReturn &= $sChr
    Next
    Return $sReturn
EndFunc   ;==>_StringProper

You, or someone else, might be able to improve on this. Look at the change in the RegExp from the original function. You might also want to capitalize words which are joined with underscore.

Func _StringProper2($sString)
    Local $bCapNext = True, $sChr = "", $sReturn = ""
    For $i = 1 To StringLen($sString)
        $sChr = StringMid($sString, $i, 1)
        Select
            Case $bCapNext = True
                If StringRegExp($sChr, '(*UCP)[\w]') Then
                    $sChr = StringUpper($sChr)
                    $bCapNext = False
                EndIf
            Case Not StringRegExp($sChr, '(*UCP)[\w]') Or StringRegExp($sChr, "[_0-9]") ; now underscore/numbers also trigger capitalization
                $bCapNext = True
            Case Else
                $sChr = StringLower($sChr)
        EndSelect
        $sReturn &= $sChr
    Next
    Return $sReturn
EndFunc   ;==>_StringProper

_StringTitleCase() will also need looking at. I don't have time today.

Edit : Modified 2nd version.

Edited by czardas
(?i) removed from regexp - not needed
Link to comment
Share on other sites

These bugs have escaped my attention. The cure is good enough but can be streamlined:

MsgBox(0, "", _StringProper3("đây là dòng chữ tiếng_việt chuẩn"))
MsgBox(0, "", _StringTitleCase3("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringProper3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)\b(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

Func _StringTitleCase3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)(?<=\PL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

This has been already discussed in Trac, for instance: https://www.autoitscript.com/trac/autoit/ticket/2914

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@jchd I just took a look at this. You have created a kind of 'camel' title case: I don't think that was quite your intention. The first word is not being capitalized. I struggled to find a good regular expression myself. I was expecting the output to be like this (patch seems to be working).

MsgBox(0, "", _StringTitleCase3("đây là dòng chữ tiếng_việt chuẩn"))

Func _StringTitleCase3($s)
    Return StringTrimLeft(Execute("'" & StringRegExpReplace(" " & StringLower($s), "(*UCP)(?<=\PL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"), 1)
EndFunc

 

Edited by czardas
Link to comment
Share on other sites

Oops, sorry I didn't have enough time to check carefully before leaving. The negation was malplaced:

Func _StringTitleCase3($s)
    Return(Execute("'" & StringRegExpReplace(StringLower($s), "(*UCP)(?<!\pL)(\p{Ll})", "' & StringUpper('$1') & '") & "'"))
EndFunc

Titlecase differs from uppercasing the first "letter" in several instances, like when applied to some digrams. As an example, at least those character fall into his bucket:
 

Codepoint    Character    Upper    Lower    Fold
u00001C5         Dž          DŽ        dž        dž
u00001C8         Lj          LJ        lj        lj
u00001CB         Nj          NJ        nj        nj
u00001F2         Dz          DZ        dz        dz

Unicode (here understand that by "human scripts") makes subtle differences between lowercase and foldcase, titlecase and propercase and titlecase. For most human scripts that doesn't make a difference but for some it matters.

EDIT: unfortunately, I don't recall Windows natively offering a primitive for Title or such, but maybe things have changed since I last looked.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

This may take some study. In English I don't think there is an official titlecase: the nearest definition appears to be AP titlecase (Associated Press). This falls under the category of writing style. How things are in other languages is a whole new universe (to me). :whistle:

Edited by czardas
Link to comment
Share on other sites

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...