Modify

Opened 4 years ago

Last modified 12 months ago

#3137 new Bug

FileRead() treats count parameter as bytes instead of characters for UTF-8 files

Reported by: miraged Owned by:
Milestone: Component: AutoIt
Version: 3.3.14.2 Severity: None
Keywords: Cc:

Description

When reading from UTF-8 files (with or without a BOM) the count parameter is treated as the number of bytes rather than the number of characters. UTF-16 and ANSI work as expected. Best case you get less characters than expected, worst case you get partial bytes in the returned string. Tested on 3.3.10.2, 3.3.14.2 and 3.3.15.0.
Example is attached.

Attachments (1)

FileRead_UTF-8_Bug.au3 (1.7 KB) - added by miraged 4 years ago.
Repro

Download all attachments as: .zip

Change History (3)

Changed 4 years ago by miraged

Repro

comment:1 Changed 12 months ago by BrewManNH

Running that test file shows me something different.

It looks like StringLen is at fault here and not FileRead. If you do a ConsoleWrite right after the FileRead, and output the @extended you will see that it's always reading 7 characters/bytes like it's supposed to, but stringlen reports the wrong information. Look at the Starting and Ending offsets, and they're identical between the first and second tests.

comment:2 Changed 12 months ago by jchd18

Furthermore, the poster uses files with BOM, so that shifts the byteread count.

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as new The ticket will remain with no owner.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.