char array in C/C++ and in DllStruct* function

oceanwaves · January 10, 2014

Hi Guys,

I have a little question want to understand :

suppose a char array -- "char[100]", it is 100 bytes, every element is 1 byte. In the C/C++, if I pass some unicode characters (assume every unicode needs 2 bytes.) in this array, I know per 2 element can represent a unicode character and C/C++ can output them normally.

But in the Autoit, when I call the Dllstruct* function, the code as below:

$a = DllStructCreate("char text[2]") ; this array only 2 bytes
DllStructSetData($a, "text", "你"); this character is chinese
$b = DllStructGetData($a, 1)
MsgBox(4096, "", $b); it can output normally, no problem

this code is working, BUT.........if I change as below:

$a = DllStructCreate("char text[10]"); it has 10 bytes
DllStructSetData($a, "text", "你好"); 2 Chinese characters
$b = DllStructGetData($a, 1)
MsgBox(4096, "", $b); output is "你?"

Why?

:ermm: Maybe I have little obsessive-compulsive disorder.......

Anyway, thanks in advance for your help. :zorro:

jchd · January 10, 2014

You can't just assume every Unicode codepoint is 2 bytes and expect correct operation. For instance, the Euro character is 3 bytes in UTF8. UTF8 may need from 1 to 4 bytes.

To store a Unicode string in a DllStruct, you need to make a convertion to UTF8, determine the actual lenght of the string (in bytes) and allocate the char[N] array in the structure. Of course the API must expect an UTF8 string of bytes.

Alternatively for APIs which expect a UTF16 string just use wchar[N] but you must remember that AutoIt merely handles UCS-2, the restriction of UTF16 to the BMP0 (roughly 64K codepoints). UTF16 representation of codepoints in higher planes is not garanteed to work correctly with all AutoIt built-in string functions.

oceanwaves · January 10, 2014

You can't just assume every Unicode codepoint is 2 bytes and expect correct operation. For instance, the Euro character is 3 bytes in UTF8. UTF8 may need from 1 to 4 bytes.

To store a Unicode string in a DllStruct, you need to make a convertion to UTF8, determine the actual lenght of the string (in bytes) and allocate the char[N] array in the structure. Of course the API must expect an UTF8 string of bytes.

Alternatively for APIs which expect a UTF16 string just use wchar[N] but you must remember that AutoIt merely handles UCS-2, the restriction of UTF16 to the BMP0 (roughly 64K codepoints). UTF16 representation of codepoints in higher planes is not garanteed to work correctly with all AutoIt built-in string functions.

Hi Jchd,

Thanks your reply. Yes, you are right, not every unicode only needs 2 bytes. But for my example code, if I just puting one unicode character into the array[2] separately, the output is OK and I think at least for these 2 characters, 2 bytes representing one character is enough. But when I put them together into the array[10]....note I allocated 10 bytes space to them, just first character can be outputed, the second is to be "?". So I think this question maybe not related with the how many bytes using for unicode.

jchd · January 11, 2014

When you set data as AutoIt string (Unicode) to a structure element defined by char or char* AutoIt converts (read: emasculates) the Unicode string to ANSI and this is what the callee will see.

To pass a Unicode verbatim you must either pass the string to wchar or wchar*, or first convert your string to UTF8 and pass that to a byte or char or char* element.

Sign In

char array in C/C++ and in DllStruct* function

Recommended Posts

oceanwaves

jchd

oceanwaves

jchd

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta