Opened 9 years ago
Closed 4 years ago
#3137 closed Bug (Fixed)
FileRead() treats count parameter as bytes instead of characters for UTF-8 files
Reported by: | miraged | Owned by: | |
---|---|---|---|
Milestone: | Component: | AutoIt | |
Version: | 3.3.14.2 | Severity: | None |
Keywords: | Cc: |
Description
When reading from UTF-8 files (with or without a BOM) the count parameter is treated as the number of bytes rather than the number of characters. UTF-16 and ANSI work as expected. Best case you get less characters than expected, worst case you get partial bytes in the returned string. Tested on 3.3.10.2, 3.3.14.2 and 3.3.15.0.
Example is attached.
Attachments (1)
Change History (4)
Changed 9 years ago by miraged
comment:1 Changed 6 years ago by BrewManNH
Running that test file shows me something different.
It looks like StringLen is at fault here and not FileRead. If you do a ConsoleWrite right after the FileRead, and output the @extended you will see that it's always reading 7 characters/bytes like it's supposed to, but stringlen reports the wrong information. Look at the Starting and Ending offsets, and they're identical between the first and second tests.
comment:2 Changed 6 years ago by jchd18
Furthermore, the poster uses files with BOM, so that shifts the byteread count.
comment:3 Changed 4 years ago by jchd18
- Resolution set to Fixed
- Status changed from new to closed
Current release/beta versions of AutoIt work correctly; the "repro" code is wrong.
This simple code
Local $f = "len.txt" FileWrite($f, "€€€") Local $s = FileRead($f) ConsoleWrite(@extended & @TAB & StringLen($s) & @LF) FileDelete($f)
correctly yields
9 3
since '€' uses 3 bytes in UTF8.
Guidelines for posting comments:
- You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
- In-depth discussions should take place on the forum.
For more information see the full version of the ticket guidelines here.
Repro