oceanwaves Posted January 10, 2014 Posted January 10, 2014 Hi Guys, I have a little question want to understand : suppose a char array -- "char[100]", it is 100 bytes, every element is 1 byte. In the C/C++, if I pass some unicode characters (assume every unicode needs 2 bytes.) in this array, I know per 2 element can represent a unicode character and C/C++ can output them normally. But in the Autoit, when I call the Dllstruct* function, the code as below: $a = DllStructCreate("char text[2]") ; this array only 2 bytes DllStructSetData($a, "text", "你"); this character is chinese $b = DllStructGetData($a, 1) MsgBox(4096, "", $b); it can output normally, no problem this code is working, BUT.........if I change as below: $a = DllStructCreate("char text[10]"); it has 10 bytes DllStructSetData($a, "text", "你好"); 2 Chinese characters $b = DllStructGetData($a, 1) MsgBox(4096, "", $b); output is "你?" Why? Maybe I have little obsessive-compulsive disorder....... Anyway, thanks in advance for your help.
jchd Posted January 10, 2014 Posted January 10, 2014 You can't just assume every Unicode codepoint is 2 bytes and expect correct operation. For instance, the Euro character is 3 bytes in UTF8. UTF8 may need from 1 to 4 bytes. To store a Unicode string in a DllStruct, you need to make a convertion to UTF8, determine the actual lenght of the string (in bytes) and allocate the char[N] array in the structure. Of course the API must expect an UTF8 string of bytes. Alternatively for APIs which expect a UTF16 string just use wchar[N] but you must remember that AutoIt merely handles UCS-2, the restriction of UTF16 to the BMP0 (roughly 64K codepoints). UTF16 representation of codepoints in higher planes is not garanteed to work correctly with all AutoIt built-in string functions. Reveal hidden contents This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
oceanwaves Posted January 10, 2014 Author Posted January 10, 2014 On 1/10/2014 at 6:15 PM, jchd said: You can't just assume every Unicode codepoint is 2 bytes and expect correct operation. For instance, the Euro character is 3 bytes in UTF8. UTF8 may need from 1 to 4 bytes. To store a Unicode string in a DllStruct, you need to make a convertion to UTF8, determine the actual lenght of the string (in bytes) and allocate the char[N] array in the structure. Of course the API must expect an UTF8 string of bytes. Alternatively for APIs which expect a UTF16 string just use wchar[N] but you must remember that AutoIt merely handles UCS-2, the restriction of UTF16 to the BMP0 (roughly 64K codepoints). UTF16 representation of codepoints in higher planes is not garanteed to work correctly with all AutoIt built-in string functions. Hi Jchd, Thanks your reply. Yes, you are right, not every unicode only needs 2 bytes. But for my example code, if I just puting one unicode character into the array[2] separately, the output is OK and I think at least for these 2 characters, 2 bytes representing one character is enough. But when I put them together into the array[10]....note I allocated 10 bytes space to them, just first character can be outputed, the second is to be "?". So I think this question maybe not related with the how many bytes using for unicode.
jchd Posted January 11, 2014 Posted January 11, 2014 When you set data as AutoIt string (Unicode) to a structure element defined by char or char* AutoIt converts (read: emasculates) the Unicode string to ANSI and this is what the callee will see. To pass a Unicode verbatim you must either pass the string to wchar or wchar*, or first convert your string to UTF8 and pass that to a byte or char or char* element. Reveal hidden contents This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now