Jump to content

File encoding reverts automatically to "Code Page Property"


Recommended Posts

I have recently have reinstalled Autoit and noticed that every time I close one of my files and open it again, or compile it,  the Encoding under File reverts to "Code Page Property".  This only happens to one file. Other Autoit files keep UTF-8 encoding as expected.

I found another topic here where a similar situation is described. There the issue is resolved bij adding the following to the SciTEUser.properties:

NewFileEncoding=UTF8
utf8.auto.check=4

I have tried this with no effect. Also the second line already exists in SciTEGlobal.properties.

The file is too big to post it here. I appriciate if someone kan advise me how turn of automatic checking and keep the encoding on UTF-8.

Link to comment
Share on other sites

Pretty strange!

Can you check if this particular file only contains valid UTF8?

#include <FileConstants.au3>

Local $sFile = "bad.txt"
Local $hFile = FileOpen($sFile, $FO_ANSI)
Local $aTxt = FileReadToArray($hFile)
Local $iLines = @extended
FileClose($hFile)

Local Const $sRegExprUTF8 = _
    "^(" & _
        "[\x00-\x7E]|" & _
        "[\xC2-\xDF][\x80-\xBF]|" & _
        "\xE0[\xA0-\xBF][\x80-\xBF]|" & _
        "[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|" & _
        "\xED[\x80-\x9F][\x80-\xBF]|" & _
        "\xF0[\x90-\xBF][\x80-\xBF]{2}|" & _
        "[\xF1-\xF3][\x80-\xBF]{3}|" & _
        "\xF4[\x80-\x8F][\x80-\xBF]{2}" & _
    ")*$"

For $i = 0 To $iLines - 1
    If StringRegExp($aTxt[$i], $sRegExprUTF8) = 0 Then
        ConsoleWrite("Invalid UTF8 in line " & $i + 1 & " : " & $aTxt[$i] & @LF)
    EndIf
Next

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

  • Developers

This is the scite version that came with the AutoIt3 installer...right?

Please install the full separate SciTE version which has many changes also related to this topic.

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Posted (edited)
2 hours ago, jchd said:

Pretty strange!

Can you check if this particular file only contains valid UTF8?

#include <FileConstants.au3>

Local $sFile = "bad.txt"
Local $hFile = FileOpen($sFile, $FO_ANSI)
Local $aTxt = FileReadToArray($hFile)
Local $iLines = @extended
FileClose($hFile)

Local Const $sRegExprUTF8 = _
    "^(" & _
        "[\x00-\x7E]|" & _
        "[\xC2-\xDF][\x80-\xBF]|" & _
        "\xE0[\xA0-\xBF][\x80-\xBF]|" & _
        "[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|" & _
        "\xED[\x80-\x9F][\x80-\xBF]|" & _
        "\xF0[\x90-\xBF][\x80-\xBF]{2}|" & _
        "[\xF1-\xF3][\x80-\xBF]{3}|" & _
        "\xF4[\x80-\x8F][\x80-\xBF]{2}" & _
    ")*$"

For $i = 0 To $iLines - 1
    If StringRegExp($aTxt[$i], $sRegExprUTF8) = 0 Then
        ConsoleWrite("Invalid UTF8 in line " & $i + 1 & " : " & $aTxt[$i] & @LF)
    EndIf
Next

 

Sure it is strange.

I compiled your script as x64 bit and ran it. Nothing happened.

BTW I don't know if it makes any sense but including the line in this script did not reversed the encoding and the encoding remained on UTF-8. It means the line and the content of my script interact somehow.

#AutoIt3Wrapper_UseX64=y

Edit: according to the script there are many lines with invalid UTF-8 character. But in fact they are valid none English characters. I gues the reason is that your script opens the file as ANSI.

Edited by Factfinder
Link to comment
Share on other sites

52 minutes ago, Jos said:

This is the scite version that came with the AutoIt3 installer...right?

Please install the full separate SciTE version which has many changes also related to this topic.

The SciTE was installed separately. I had installed Autoit with the latest Autoit3 installer. Then installed latest version of SciTE. I did the same this morning again with no result.

Link to comment
Share on other sites

  • Developers

Ok, could you share a file that shows his issue? I have made many more changes to the current Beta and would like to test your file to see if it shows the same issue or whether it is working properly in that version?

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

My problem is resolved and I can't say what exactly triggered the issue. I had backups and I went back and examined those backups and found the file with which the issue started to act. I replaced some scripts and the issue was resolved.

I understand that a test file is needed but unfortunately I can not provide that. If I put the lines I replaced together with the parametre I mentioned in a new file it doesn't recreate the issue. So there should be different things that interact together and I replaced one of them. I will keep an eye on it and in case it comes back, I will try to put some script together for testing.

Thank you for your time. :)

Link to comment
Share on other sites

  • Developers
19 minutes ago, Factfinder said:

My problem is resolved and I can't say what exactly triggered the issue.

sure..

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

2 hours ago, Factfinder said:

according to the script there are many lines with invalid UTF-8 character. But in fact they are valid none English characters. I gues the reason is that your script opens the file as ANSI

My script checks for valid Unicode characters encoding in UTF8. It goes beyond AutoIt charset (UCS2) as it also checks for upper Unicode planes (which can't be natively processed by vanilla AutoIt).

Yes I open the file in ANSI mode, which means I read it as a bunch of 8-bit charaters. Then The For loop checks whether it find invalid UTF8 sequences.

Since it does find many invalid sequences, that only means your file is NOT UTF8. As @Jos said, post the file either publickly or privately and I'll tell you what the problem is.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

43 minutes ago, jchd said:

My script checks for valid Unicode characters encoding in UTF8. It goes beyond AutoIt charset (UCS2) as it also checks for upper Unicode planes (which can't be natively processed by vanilla AutoIt).

Yes I open the file in ANSI mode, which means I read it as a bunch of 8-bit charaters. Then The For loop checks whether it find invalid UTF8 sequences.

Since it does find many invalid sequences, that only means your file is NOT UTF8. As @Jos said, post the file either publickly or privately and I'll tell you what the problem is.

Thank you for your detailed reply.

Those characters that are detected by your script are part of German words. The characters show normally when the encoding is set to UTF-8 and are misformed when set to Code Page Property.

Link to comment
Share on other sites

Sorry for getting back to you this late.

My bad, this code just can't work.

Let me explain why: long ago I requested a change in PCRE regex support in AutoIt so the implementation relies on strings being native AutoIt strings encoding, that is UCS2 encoding (a subset of Unicode limited to the Basic Multilingual Plane, namely 0x0000-0xFFFF).

So a string passed to StringRegExp can't be examined byte per byte: the PCRE engine examines the string 16-bit unit per 16-bit unit in this implementation.

To check for valid UTF8 encoding we must read the file as binary and check if we can match any UTF8 encoding sequence (not using a regexp), similarly to the ranges in the regexp in my post.

Apologies for misleading you, I lazily copied the code from a program in another language but didn't realize it was deemed to failure in AutoIt.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

It's always embarrassing to publickly post wrong, defective and untested code, even with the laudable intent to help people!

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

On 3/18/2024 at 7:27 PM, Factfinder said:

My problem is resolved and I can't say what exactly triggered the issue. I had backups and I went back and examined those backups and found the file with which the issue started to act. I replaced some scripts and the issue was resolved.

I understand that a test file is needed but unfortunately I can not provide that. If I put the lines I replaced together with the parametre I mentioned in a new file it doesn't recreate the issue. So there should be different things that interact together and I replaced one of them. I will keep an eye on it and in case it comes back, I will try to put some script together for testing.

Thank you for your time. :)

 

Based on that I think there was some problem/mistake with BOM (few binary nonvisible bytes at start of files defining UTF coding, BOM is not mandatory for UTF files).

Link to comment
Share on other sites

1 hour ago, Zedna said:

 

Based on that I think there was some problem/mistake with BOM (few binary nonvisible bytes at start of files defining UTF coding, BOM is not mandatory for UTF files).

If I open the Autoit script I have made, under File -> Encoding choose UFT-8 (not UTF-8 with BOM) everything looks okay and the German characters look okay, but when I close the file and open it or compile it, the Encoding under File reverts to "Code Page Property" and the German characters are misformed.

BTW, I thought it was related to the x64 wrapper as I had this issue with the  x64 version of a file. But yesterday I had the same issue with the x86 version of the same script and I had to make some changes in order to get the encoding stay to UTF-8. I can't make head nor tale of it.

Could you please explain why there was a problem with BOM?

Link to comment
Share on other sites

in the SciTEUser.properties put following lines

code.page=65001
output.code.page=65001
NewFileEncoding=UTF8
utf8.auto.check=4 
font.quality=4

then make a new file, and copy the old file to the new one

when i run a script the console says:
+>15:46:13 Starting AutoIt3Wrapper (21.316.1639.1) from:SciTE.exe (4.4.6.0)  Keyboard:00000409  OS:WIN_10/2009  CPU:X64 OS:X64  Environment(Language:0409)  CodePage:65001  utf8.auto.check:4

assuming you have it installed SciTE4AutoIt3.exe

Edited by ioa747

I know that I know nothing

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...