unicode .au3 files question - script and includes

iCode · November 15, 2013

just noticed this in the help file...

"The recommended script format is UTF-8 with BOM. ANSI formats are not recommended for languages other than English as they can cause problems when run on machines with different locales."

i understand that to mean that the actual au3 scripts to be compiled and used on non-EN systems should be UTF-8 with BOM, correct?

if i am correct, than why are all of the include files i checked encoded in ANSI?

czardas · November 15, 2013

One of the best questions I've heard for a while.

The first 128 characters are the same in both unicode and ansi, so there should be no conflicts. The extended ascii characters of the win-1252 code page are different from unicode characters (128 - 255) and this will cause conflicts with different systems. In other words you are safe to use characters 0 - 127 in both encoding systems. I hope this answers part of your question.

I believe UTF-8 is recommended for the web and I think UTF-16 is more associated with windows. I don't understand the need for BOM - I think BOM may be a misunderstanding or unresolved issue between developers (I'm probably wrong but personally I think it's unfortunate). Hopefully someone will have further information to add.

Edited November 15, 2013 by czardas

jchd · November 15, 2013

A BOM is a necessary evil. This is a consequence of the sad fact that every UTF8 without BOM file is a valid but erroneous ANSI file in almost all^(*) variants of so-called ANSI.

To convince yourself, compare these two readings of the exact same script whose meaning is different whether you interpret it in UTF8 w/o BOM

ConsoleWrite("Nous avons demandé au vendeur d'expédier l'objet. Connectez-vous à votre compte PayPal pour consulter les détails de la transaction." & @LF)

or emasculated as Windows western (latin-1)

ConsoleWrite("Nous avons demandÃ© au vendeur d'expÃ©dier l'objet. Connectez-vous Ã  votre compte PayPal pour consulter les dÃ©tails de la transaction." & @LF)

(*) a number of non-UTF encodings widely used in Asia use double-byte representation where not all binary combinations (hi-lo) are valid.

EDIT: forgot to mention that I strongly advocate for the whole AutoIt tool chain to only recognize and process UTF8 + BOM files, #includes included. That would definitely solve all questions about source encodings and promote universal non-ambiguity.

Edited November 15, 2013 by jchd

czardas · November 15, 2013

Okay thanks for that explanation. I wasn't too far wrong. >_<

Let's go with the necessary evil of including byte order marks.

Edited November 15, 2013 by czardas

iCode · November 15, 2013

thanks for the answers

that explains why the include files are ANSI and also, i think, why my script size did not change substantially when i converted one to UTF-8

Sign In

unicode .au3 files question - script and includes

Recommended Posts

iCode

czardas

jchd

czardas

iCode

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta