Jump to content
kyo

Autoit3Wrapper character encoding problem

Recommended Posts

Download SciTE.exe and SciLexer.dll from this location: https://www.autoitscript.com/autoit3/scite/download/beta_SciTE4AutoIt3/

Jos

Edited by Jos

Share this post


Link to post
Share on other sites

Same here, I just downloaded SciTE.exe and SciLexer.dll and I have the exact same problems as cetipabo, which are exactly what I described yesterday.

Share this post


Link to post
Share on other sites

i uninstalled autoit and scite4autoit, i deleted manually the remaining folders to be sure that everything is deleted.

i reinstalled the latest versions of both apps from autoitscript.com

i'm still getting the same problem. The default is code page property, what ever i'm doing.

Not sure if it's important or not, but i'm using:
Windows 10 pro 64bit
Norton Symantec Endpoint antivirus, which find a hundred of suspect things when i try to uninstall scite4autoit...very Strange.

Edited by cetipabo

Share this post


Link to post
Share on other sites

Windows 7 Pro 64 bits here, with Microsoft Security Essentials (no worries, the machine is on a closed network), if that can help.

Share this post


Link to post
Share on other sites

from scite, Options> Open the global option file

couldn't it be Something wrong here:

# Internationalisation
#NewFileEncoding=CodePage/UTF8BOM/UTF8/UTF16BE/UTF16LE         # Only available in SciTE4AutoIt3 version
# Japanese input code page 932 and ShiftJIS character set 128
#code.page=932
#character.set=128
# Unicode
#code.page=65001
code.page=0
#character.set=204
#command.discover.properties=python /home/user/FileDetect.py "$(FilePath)"
# Required for Unicode to work on GTK+:
#LC_CTYPE=en_US.UTF-8
if PLAT_GTK
    output.code.page=65001
if PLAT_MAC
    output.code.page=65001

as suggested in another post in this forum, changing to this:

code.page=65001
#code.page=0

The file encoding is still code page property, but the characters are displayed correctly.

Share this post


Link to post
Share on other sites

And what does Options > Open User Options File look like?

Also something worth noting: if you change encoding in SciTE, you need to actually change the file content (add a space anywhere then delete it) then save for the encoding to really change.

Can you all check what actual encoding the offending file(s) use by opening in, say, NotePad++ or PsPad (try its hex mode to be sure).

Also one thing; does the encoding still revert to code page if the file contains non-ANSI chars?In other words, does that UTF8 source get corrupted or behave unexpectedly? It should display the text verbatim:

MsgBox(0, "", "온라인카지노 ӘҔҖҬӢӴԐӹ ℂℍℕℙℚℝℤ")

"Funny chars" selected as such because they can't all be displayed correctly in any single ANSI codepage. So if you're seeing the text verbatim in both the msgbox and the source, then Unicode is at work.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Notepad says UTF8 without bom, while scite says code page property

i keep getting the problem even when i change a char in the file before changing the encodiing and saving it.

the file doesn't seem to be corrupted, because when we open the file with the weird char, turning the encoding to UTF8 is visualy fixing the problem. but as i said above, when you save the file after changing to UTF8 (even if we add a space or change a char) when you open the file it is back to code page property.

is it a problem when we open the file or when we save it, i don't know.
 

Edited by cetipabo

Share this post


Link to post
Share on other sites

I run the latest release, using SciTE is v3.5.4 and I don't experience the issue. Is that the version of SciTE you run (Help > AboutSciTE)?

IIUC it looks like the last SciTE (v3.5.5) doesn't (correctly?) double-check UTF8 no-BOM file encoding. Indeed, any no-BOM UTF8 file is a valid ANSI file (for almost whatever codepage you use) but the converse isn't true and it's pretty hard (read: definitely impossible) to ascertain 100% that a given no-BOM text file is UTF8 or some (more or less exotic) ANSI codepage.

People who tried to save the file as UTF16 don't seem to report the issue anymore and it's certainly due to the fact that detecting UTF16 is much easier and failsafe than UTF8, provided you dont expect 0x00 and friends unusual control chars in a text file.

I expect the problem to be around that failure of new SciTE to detect no-BOM UTF8 with the heuristic it used in previous versions. Again any heuristic you come up with can't be completely failproof but it should work in "most" cases for a large range of "most" values.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

i was also using the SciTE is v3.5.4 until jos asked me to download the beta version from is repository: https://www.autoitscript.com/autoit3/scite/download/beta_SciTE4AutoIt3/

so i now i'm with 3.6.0  Aug  4 2015 17:34:27

i don't do anything special, i just install the available version from this site so i don't understand why you can't reproduce the problem.

i'm using a fresh Windows 10 installation, i never installed any previous autoit version than the latest one.

 

Share this post


Link to post
Share on other sites

well, so i did another test:

when i right click >create a new au3 file, i open this file with Notepad++ it says it's UTF8 wo bom.

when i open the file in scite it says Encoding: code page property

EDIT:
if i add a space an save the file scite says it's code page property but Notepad++ says UTF8 wo bom.

But if i add an accented char like "é à è" and i save the file with scite, then Notepad says ANSI.

Edited by cetipabo

Share this post


Link to post
Share on other sites

I think Jos was using the routines I wrote for AutoIt. Maybe something got left out in a recent scite/merge?

No, I have used the routine as published in SciTE_RU.

EDIT:

if i add a space an save the file scite says it's code page property but Notepad++ says UTF8 wo bom.

But if i add an accented char like "é à è" and i save the file with scite, then Notepad says ANSI.

This makes sense and is the way it should be when on code page property.

Let me do some checking.

Edit:

Could you add these in your SciTEUser.properties and try again?:

NewFileEncoding=UTF8
utf8.auto.check=4

Jos

Edited by Jos

Share this post


Link to post
Share on other sites

Ah! Jos, you're cheating by using a new magic wand :o

Let's see how things go...


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I remember when I implemented this option of detecting ANSI special characters or else use UTF8 wo Bom encoding, I didn't want to make it the default behaviour yet and  forgot to make that change in SciTEGlobal .properties.

Honestly, that first option, which is already available for a long time, is not needed any more.

Jos

Share this post


Link to post
Share on other sites

With these two lines added to SciTEUser.properties, it seems to be working.

I modified an option in AutoIt3Wrapper and the encoding did not change - no, it did change from Code Page Property to UTF-8 without BOM, and saved the file with my french characters intact.

I tried the same trick with SciTE 3.5.4 which I have on another machine, and it works the same way.

Share this post


Link to post
Share on other sites

Could you add these in your SciTEUser.properties and try again?:

NewFileEncoding=UTF8
utf8.auto.check=4

Jos

haaa, that works better now.

Share this post


Link to post
Share on other sites

This is form the SciTE history page which describes this setting and  will add the option to the next release in SciTEGLobal.properties:

*** Merged the SciTE v 3.5.4 by Neil Hodgson with our own version of SciTE. (Jos)
    Added  utf8.auto.check which will autodetect UTF8 encoded files without BOM and files containing Highvalue ASCII characters and setting the correct encoding.
        We have set the default to 4 which means that the encoding is set to UTF8 without BOM for any script containing normal ASCII characters.
        #~ Enhance function of auto checking utf8: providing two methods
        #~ utf8.auto.check=1: detect utf8 and add BOM automatically
        #~ utf8.auto.check=2: detect utf8 and do not add BOM
        #~ utf8.auto.check=3: detect ascii high characters and if none found set default encoding to UTF8 and add BOM
        utf8.auto.check=4: detect ascii high characters and if none found set default encoding to UTF8 and do not add BOM

@kyo: I am not sure whether you are telling me everything is correct now or not?

Thanks for testing and reporting this!

Jos

Share this post


Link to post
Share on other sites

FYI there are several SciTEUser.properties

C:\Users\xxxx\AppData\Local\AutoIt v3\SciTE

C:\Program Files (x86)\AutoIt3\SciTE

looks like the correct one is the first ?

Share this post


Link to post
Share on other sites

@kyo: I am not sure whether you are telling me everything is correct now or not?

Yes it is working as I expected, in the same way it was working with the previous versions - even though I never fiddled with the encoding of my source files.

If all it took was these two lines, I'm happy!

Thanks Jos, it's very much appreciated!

Edited by kyo
English not my native language, sometimes it shows!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...