Jump to content

UTF-16 strings


jchd
 Share

Recommended Posts

Hi everybody,

I feel the need to routinely manipulate UTF-16 strings. Conversions between ANSI + codepage, UTF-8 and UTF-16 can be made more or less efficiently, but when it comes to perform string operations with UTF-16 strings, I'm at loss with current AutoIt.

Of course it's still possible to wrap calls to UTF-16 string functions from system or other DLLs, but I suspect this has already been done and it's likely something already exists.

Or is there a more obvious way I overlook?

Thank you in advance for any hint.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

How do you get your UTF-16-strings? And what do you want to do with them?

AFAIK, AutoIt 3.3.0.0 works internally with Unicode / UTF-16-strings, so this shouldn't be a problem.

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Link to comment
Share on other sites

How do you get your UTF-16-strings? And what do you want to do with them?

AFAIK, AutoIt 3.3.0.0 works internally with Unicode / UTF-16-strings, so this shouldn't be a problem.

I'm developping "glue" applications in AutoIt for business management. I have a Pervasive base (in ANSI) at one side (I handle it thru ODBC layer, no problem) and two distinct SQLite3 bases one in UTF-8 and one in UTF-16 at the other hand. Hence my question about UTF-16 support from within AutoIt.

I feel a little amazed to find that, according to what you wrote, string functions _do_ handle UTF-16 correctly.

But, OK, I had a hard time discovering that AutoIt will handle strings correctly. I was/am completely confused by the fact that _WinAPI_WideCharToMultiByte and _WinAPI_MultiByteToWideChar are using a _structure_ to hold input or output UTF16 instead of "wstr" type, while ANSI ou UTF-8 are using "str" type directly.

Now the problem boils down to the conversion routines. The 2 routines that convert multi-byte to multi-byte work fine, but I don't succeed in getting the UTF-16 strings corectly. I find it surprising because UTF-16 "structures" work fine between back to back calls to kernel32.dll routines. It's [again] certainly a simple error.

CODE
; string conversion: ANSI (default or given code page) --> UTF-8

Func _AnsiToUtf8($AnsiString, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($AnsiString, $CodePage)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($struct), 65001))

EndFunc

; string conversion: ANSI (default or given code page) --> UTF-16

Func _AnsiToUtf16($AnsiString, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($AnsiString, $CodePage)

Return($struct)

EndFunc

; string conversion: UTF-8 --> ANSI (default or given code page)

Func _Utf8ToAnsi($Utf8String, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($Utf8String, 65001)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($struct), $CodePage))

EndFunc

; string conversion: UTF-8 --> UTF-16

Func _Utf8ToUtf16($Utf8String)

Local $struct = _WinAPI_MultiByteToWideChar($Utf8String, 65001)

Return($struct)

EndFunc

; string conversion: UTF-16 --> ANSI (default or given code page)

Func _Utf16ToAnsi($Utf16Struct, $CodePage = 0)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($Utf16Struct), $CodePage))

EndFunc

; string conversion: UTF-16 --> UTF-8

Func _Utf16ToUtf8($Utf16Struct)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($Utf16Struct), 65001))

EndFunc

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I'm developping "glue" applications in AutoIt for business management. I have a Pervasive base (in ANSI) at one side (I handle it thru ODBC layer, no problem) and two distinct SQLite3 bases one in UTF-8 and one in UTF-16 at the other hand. Hence my question about UTF-16 support from within AutoIt.

I feel a little amazed that, according to what you wrote, string functions handle UTF-8/16 correctly. It doesn't seem to me this is the case. I might get completely wrong, but the following code proves that UTF-16 isn't that native. Even StringLen() doesn't get the number of _characters_ right when it's passed a UTF-8 string. It return the byte-length of the string. Computing its character length is painful and utterly inefficient.

CODE

;; ANSI encoding (I use Latin1)

; string conversion: ANSI (default or given code page) --> UTF-8

Func _AnsiToUtf8($AnsiString, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($AnsiString, $CodePage)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($struct), 65001))

EndFunc

; string conversion: ANSI (default or given code page) --> UTF-16

Func _AnsiToUtf16($AnsiString, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($AnsiString, $CodePage)

Return($struct)

EndFunc

; string conversion: UTF-8 --> ANSI (default or given code page)

Func _Utf8ToAnsi($Utf8String, $CodePage = 0)

Local $struct = _WinAPI_MultiByteToWideChar($Utf8String, 65001)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($struct), $CodePage))

EndFunc

; string conversion: UTF-8 --> UTF-16

Func _Utf8ToUtf16($Utf8String)

Local $struct = _WinAPI_MultiByteToWideChar($Utf8String, 65001)

Return($struct)

EndFunc

; string conversion: UTF-16 --> ANSI (default or given code page)

Func _Utf16ToAnsi($Utf16Struct, $CodePage = 0)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($Utf16Struct), $CodePage))

EndFunc

; string conversion: UTF-16 --> UTF-8

Func _Utf16ToUtf8($Utf16Struct)

Return(_WinAPI_WideCharToMultiByte(DllStructGetPtr($Utf16Struct), 65001))

EndFunc

Global $s = "éèçàùôÎÄ", $len, $u16, $u16s, $u8, $a

MsgBox(0, "An ANSI string", "The initial string: " & $s & @LF & "As binary: " & StringToBinary($s) & @LF & "Its length is " & StringLen($s) & " characters." )

$u16 = _ansitoutf16($s)

$u16s = DllStructGetData($u16, 1)

MsgBox(0, "Ansi to Utf16", "We obtain a hex binary string instead of a UTF16le string!" & @LF & $u16s)

MsgBox(0, "StringUpper result", StringUpper($u16s))

$u8 = _utf16toutf8($u16)

MsgBox(0, "Utf16 to Utf8", $u8 & @LF & "Its length is " & StringLen($u8) & @LF & "This is the _byte_ length of the UTF-8 string!")

$a = _utf8toansi($u8)

MsgBox(0, "Utf8 to Ansi", $a)

MsgBox(0, "Round trip check", $a == $s)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...