Modify

Opened 12 years ago

Closed 12 years ago

#2019 closed Bug (No Bug)

File functions: Three help file issues and one bug

Reported by: Aru Owned by:
Milestone: Component: AutoIt
Version: 3.3.6.1 Severity: None
Keywords: FileOpen FileWrite BOM Cc:

Description

Not really a bug, but I couldn't think of where else to put these. Issues in the help file.

1) "16 = Force binary mode (See Remarks)." is on the FileOpen page, but there's nothing about binary mode in the remarks.

There is on the pages for FileRead and FileWrite, but they should probably be on the FileOpen page too which actually has the byte mode flag.

2) The FileRead page doesn't mention anything about where the read will start, doesn't imply that it will use the current 'position' or otherwise.

3) In the help file, the FileOpen flags 32, 64 and 128 say that "Reading does not override existing BOM." which doesn't make any sense. Reading doesn't overwrite anything anyway. Pretty sure it should say writing, and should probably specify the second write flag (as the first is append and shouldn't be anywhere near the BOM anyway).

4) This one's a bug. I'm not sure whether the FileOpen 128 flag (versus the 256) is supposed to keep the write position from moving lower than 3, or cause write flag 2 to have an automatic protected BOM inserted in front of it, but it seems to do neither.

FileOpen sets the initial position to 3 for a UTF-8 file that has a BOM when you use a 128 flag, a 256 flag, or no flag (auto detection). In all cases, FileSetPos(, 0, 0) sets that position to 0, which is before the BOM. I *think* that's probably intentional, using that same constant regardless of the presence of a BOM.

If you FileOpen a UTF-8 .txt that has a BOM with the 130 flag (128+2), and then FileWrite, the BOM stays. But if you call FileSetPos(, 0, 0) and then FileWrite, the BOM disappears, meaning the 128 flag is not doing anything at all here. It doesn't restrict the write position in write mode 2 and it doesn't prefix a BOM (if you return to the beginning of the file).

If you FileSetPos(, >2, 0) then flag 130 prefixes the BOM and then pads with spaces to get to the right position before beginning the write.

None of this behavior is mentioned or detailed in the help file, so I can't tell what's intended. "Append a text/data to the end of a previously opened file." is the description of FileWrite, but it seems to be a lie if you FileOpened with write flag 2 instead of 1. And all that's said of 2 is "2 = Write mode (erase previous contents)" (under FileOpen), which implies to me that it writes from the beginning of the file and position doesn't matter, not a) pads to the current position with spaces first, and b) starts that padding with a(n overwriteable) BOM instead of spaces if flag 128 is included/detected AND the position is >2. Even with the 128 flag, if the position isn't >2, part of the BOM is overwritten. If you use position 1, you get 'ïText'.. the first UTF-8 BOM byte with the other two overwritten.

Attachments (0)

Change History (4)

comment:1 Changed 12 years ago by Valik

Your wall of text is difficult to get through. You seem to be making some assumptions about behavior on various points so I think in some cases you are confusing "bug" with "doesn't know what is happening". You also show a lack of critical thinking. Particularly, where else would FileRead() read from other than the current position? The end? There's nothing there. The beginning? Presumably you already did that if the current position isn't at the beginning. Maybe you used FileSetPos() to change the current position in which case if you want to read from somewhere else you already know how to get there. It makes me dubious of the entirety of your entire report.

The biggest issue, however, is this: You did not provide a script to reproduce any of the problems in your so called "bug".

comment:2 Changed 12 years ago by Mat

Part 1 was reported here.

Part 3: Override and Overwrite are two very different things. The behaviour is that if a flag is specified to read&write a file as UTF16 when has a UTF8 BOM then it will still be read as UTF8. This only applies to reading, writing will overwrite the BOM.

comment:3 Changed 12 years ago by jchd

I know this isn't the best place for discussion, but let me add this:
Valik addressed part 2) and Mat addressed parts 1) and 3). Let's look at part 4).

Any text file not using one of the single-byte codepages is a structured file: it contains a series of entities, sometimes varying in byte length. So either you rely on the file-to-Unicode reading functions calls, thereby leaving the interpretation of entities length to the low-level underlying functions, or you treat the file as a series of bytes which you then interpret yourself.

Expecting FileSetPos to match any entity boundary (BOM, character or grapheme) is exactly the same as expecting it will magically match a line boundary. In short your "bug" amounts to cheating with the file position.

AutoIt doesn't offer FileReadNextUTF8Char and FileSetUTF8Pos functions. That would need to first define what one want a "Unicode character" to mean (codepoint, sort entity, grapheme, localized grapheme) and also define which normalization form to use. Don't hold your breath; in the meantime avoid using FileSetPos randomly.

comment:4 Changed 12 years ago by Valik

  • Resolution set to No Bug
  • Status changed from new to closed

Seems we all covered this. Closing as no bug.

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as closed The ticket will remain with no owner.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.