Jump to content
Sign in to follow this  
iCode

unicode .au3 files question - script and includes

Recommended Posts

iCode

just noticed this in the help file...

"The recommended script format is UTF-8 with BOM. ANSI formats are not recommended for languages other than English as they can cause problems when run on machines with different locales."

 

i understand that to mean that the actual au3 scripts to be compiled and used on non-EN systems should be UTF-8 with BOM, correct?

if i am correct, than why are all of the include files i checked encoded in ANSI?


FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Share this post


Link to post
Share on other sites
czardas

One of the best questions I've heard for a while.

The first 128 characters are the same in both unicode and ansi, so there should be no conflicts. The extended ascii characters of the win-1252 code page are different from unicode characters (128 - 255) and this will cause conflicts with different systems. In other words you are safe to use characters 0 - 127 in both encoding systems. I hope this answers part of your question.

I believe UTF-8 is recommended for the web and I think UTF-16 is more associated with windows. I don't understand the need for BOM - I think BOM may be a misunderstanding or unresolved issue between developers (I'm probably wrong but personally I think it's unfortunate). Hopefully someone will have further information to add.

Edited by czardas

Share this post


Link to post
Share on other sites
jchd

A BOM is a necessary evil. This is a consequence of the sad fact that every UTF8 without BOM file is a valid but erroneous ANSI file in almost all(*) variants of so-called ANSI.

To convince yourself, compare these two readings of the exact same script whose meaning is different whether you interpret it in UTF8 w/o BOM

ConsoleWrite("Nous avons demandé au vendeur d'expédier l'objet. Connectez-vous à votre compte PayPal pour consulter les détails de la transaction." & @LF)

or emasculated as Windows western (latin-1)

ConsoleWrite("Nous avons demandé au vendeur d'expédier l'objet. Connectez-vous à  votre compte PayPal pour consulter les détails de la transaction." & @LF)

(*) a number of non-UTF encodings widely used in Asia use double-byte representation where not all binary combinations (hi-lo) are valid.

EDIT: forgot to mention that I strongly advocate for the whole AutoIt tool chain to only recognize and process UTF8 + BOM files, #includes included. That would definitely solve all questions about source encodings and promote universal non-ambiguity.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
czardas

Okay thanks for that explanation. I wasn't too far wrong. >_<

Let's go with the necessary evil of including byte order marks.

Edited by czardas

Share this post


Link to post
Share on other sites
iCode

thanks for the answers

that explains why the include files are ANSI and also, i think, why my script size did not change substantially when i converted one to UTF-8


FUNCTIONS: WinDock (dock window to screen edge) | EditCtrl_ToggleLineWrap (line/word wrap for AU3 edit control) | SendEX (yet another alternative to Send( ) ) | Spell Checker (Hunspell wrapper) | SentenceCase (capitalize first letter of sentences)

CODE SNIPPITS: Dynamic tab width (set tab control width according to window width)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×