Jump to content
Sign in to follow this  
HurleyShanabarger

FileOpen in ANSI

Recommended Posts

HurleyShanabarger

Hi guys,

I have a huge amount of files that are coded in UTF-8 but for backwards compatibility I need to convert it to ANSI. No big deal I thought, but if I do that using AutoIt it fails for me. I tested it with 3.3.14.2 and the current beta 3.3.15.0 - did I miss something?

For $iMode = 5 To 11
    _FileWrite_Mode(2^$iMode)
Next

Func _FileWrite_Mode($iMode)
    FileDelete(@DesktopDir & "\test.txt")
    $_lv_hFile = FileOpen(@DesktopDir & "\test.txt", $iMode + 2)
    FileWrite($_lv_hFile, "test")
    FileClose($_lv_hFile)
    ConsoleWrite("----------------------------" & @CRLF)
    ConsoleWrite("Expected:" & @TAB & $iMode & @CRLF)
    ConsoleWrite("Detected:" & @TAB & FileGetEncoding(@DesktopDir & "\test.txt") & @CRLF)
EndFunc   ;==>_FileWrite_Mode

 

Edited by HurleyShanabarger

Share this post


Link to post
Share on other sites
AspirinJunkie

If there is no BOM FileGetEncoding has to estimate the Encoding with the help of heuristics.
In your case the string "test" has not enough information to separate ANSI from UTF8 without BOM.
That's because a file with the text "test" is on binary level completely the same in both encodings.
So FileEncoding has to guess in this case.
But that doesn't mean that the file isn't written in ANSI-encoding.
It only means that FileGetEncoding never had a chance to seperate the correct encoding from another one with perfect precision.

Add for an example a "€" sign to your file string and you will see that now FileGetEncoding estimate the encoding correctly.

 

Share this post


Link to post
Share on other sites
HurleyShanabarger

Thank you. That basically means if I convert the files using $FO_ANSI it will be created as ANSI and it is just the detection that fails. If I want to check a file if it has been converted and the result return as $FO_UTF8_NOBOM I can just append "€" to the file and recheck it - if it results as $FO_ANSI the file is already converted - correct?

Share this post


Link to post
Share on other sites
AspirinJunkie
7 minutes ago, HurleyShanabarger said:

That basically means if I convert the files using $FO_ANSI it will be created as ANSI and it is just the detection that fails.

Yes

7 minutes ago, HurleyShanabarger said:

If I want to check a file if it has been converted and the result return as $FO_UTF8_NOBOM I can just append "€" to the file and recheck it - if it results as $FO_ANSI the file is already converted - correct?

No. If you mix up encodings in different write operations the result can be misleading.
For example:

$s_File = @ScriptDir & "\Test.txt"

; write File in UTF8
$hFile = FileOpen($s_File, 2 + 256)
FileWriteLine($hFile, "This is a test with some special chars like € or @ or ÄÖÜ")
FileClose($hFile)
ConsoleWrite(FileGetEncoding($s_File) & @CRLF)

; add € in ANSI:
$hFile = FileOpen($s_File, 1 + 512)
FileWrite($hFile, "€")
FileClose($hFile)
ConsoleWrite(FileGetEncoding($s_File) & @CRLF)

ShellExecute($s_File)

So in this case FileGetEncoding recognizes the encoding after all as ANSI but the text before is still in UTF8-encoding.
This leads to display errors when viewing in a text editor.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.