Jump to content
Sign in to follow this  
Ahmedmb

UTF-8 TEXT?

Recommended Posts

Ahmedmb

how write UTF-8 TEXT file?

:)

Share this post


Link to post
Share on other sites
somh

how write UTF-8 TEXT file?

;)

my question too.

Share this post


Link to post
Share on other sites
peethebee

Hi!

I don't thnk that it is possible :-(.

peethebee


vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvGerman Forums: http://www.autoit.deGerman Help File: http://autoit.de/hilfe vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Share this post


Link to post
Share on other sites
footswitch

Where is the UTF-8 text coming from? a file? I could demonstrate some binary file read/write... if I knew some specifics.

Lar.

hello. in my case, the UTF-8 text comes from .txt files saved from common Windows Notepad in UTF-8 format. i need to use this format in order to read variables into Macromedia Flash correctly.

an utf-8 file read with the FileReadLine() returns strange chars for the first letters of the first line, and all chars out of the a...z A...Z 0...9 bound become incorrect.

i use chars like à À Á á é í ó ú â ã ...

can you provide that binary file read/write demonstration, please?

thanks in advance for your support


Share this post


Link to post
Share on other sites
___Kevin___

I would like to know how to read UTF-8 text, too...

Kevin

Share this post


Link to post
Share on other sites
___Kevin___

hello. in my case, the UTF-8 text comes from .txt files saved from common Windows Notepad in UTF-8 format. i need to use this format in order to read variables into Macromedia Flash correctly.

an utf-8 file read with the FileReadLine() returns strange chars for the first letters of the first line, and all chars out of the a...z A...Z 0...9 bound become incorrect.

i use chars like à À Á á é í ó ú â ã ...

can you provide that binary file read/write demonstration, please?

thanks in advance for your support

Any news on that? Is it still not possible to read UTF8 encoded text files?

Thanks,

Kevin.

Share this post


Link to post
Share on other sites
svkhtn

I am also interested in this, but haven't found any solution yet.

1. I created a UTF-8 text file test.txt by Notepad. The text file (test.txt) contains this line "Kiểm tra Tiếng Việt."

2. Read it in raw mode.

3. Used ControlSendText to send it back to that opened Notepad (test.txt is currently open), but the text appeared wrong.

$strFile = "test.txt"
$file = FileOpen($strFile, 4)

$text = FileRead($file,FileGetSize($strFile))

ControlSetText("test.txt", "", "Edit1", $text)

Do I need to convert $text before sending it back to Notepad by ControlSetText?

If so, how can I convert?

Many thanks!!!

hello. in my case, the UTF-8 text comes from .txt files saved from common Windows Notepad in UTF-8 format. i need to use this format in order to read variables into Macromedia Flash correctly.

an utf-8 file read with the FileReadLine() returns strange chars for the first letters of the first line, and all chars out of the a...z A...Z 0...9 bound become incorrect.

i use chars like à À Á á é í ó ú â ã ...

can you provide that binary file read/write demonstration, please?

thanks in advance for your support

Share this post


Link to post
Share on other sites
sulfurious

UTF-8, friend or foe?

Here is an ASCII string "I am not"

Here is the ASCI hex

49 20 61 6D 20 6E 6F 74

Now, UTF-8, being a variable length encoded standard, will use 8 to 16 bits to make up a string.

The above string would most likely, in hex, be exactly the same.

Now, UTF-16 uses 16 bits to encode, so the above would look like this

49 00 20 00 61 00 6D 00 20 00 6E 00 6F 00 74 00

The problem here is that UTF-8, when using an extended character, will use only 8 bits normally. Consider the sample string "Kiểm tra Tiếng Việt". In UTF-16, it would be

4B 00 69 00 C3 1E 6D 00 20 00 74 00 72 00 61 00 20 00 54 00 69 00 BF 1E 6E 00 67 00 20 00 56 00 69 00 C7 1E 74 00

Look at the same string in UTF-8, and see

4B 69 E1 BB 83 6D 20 74 72 61 20 54 69 E1 BA BF 6E 67 20 56 69 E1 BB 87 74

Meanwhile, ASCII has no extended characters, so you would see the string as "Ki?m tra Ti?ng Vi?t" and hex as

4B 69 3F 6D 20 74 72 61 20 54 69 3F 6E 67 20 56 69 3F 74

So, is UTF-8 friend or foe?

In the world of AutoIt, I would consider it a foe. Reason? Because AutoIt has no true Unicode functionality. With a UTF-16 file, you can check for extended characters because you will definately see them in hex. Example? FF would be the highest value for the first 4 bits of a UTF-16 character, with the second 4 bits always being 00 for non-extended characters. Whereas anything above FF or the second 4 bits being NOT 00, indicates an extended character.

Let me ask you, how are you going to tell a parsing routine when to "know" that a certain hex value is extended? Look at UTF-8 in a hex editor. Short of knowing what the text is, or seeing the text representation on the side, you would not know. There is no marker that I know of that you could use to change your script logic.

UTF-16 at least gives you the capability to check bits.

later,

Sul

Share this post


Link to post
Share on other sites
sulfurious

Hmm. Here is a test string UTF-8

I am not going to give in to Kiểm tra Tiếng Việt

à À Á á é í ó ú â ã

And here is the hex for that
49 20 61 6D 20 6E 6F 74 20 67 6F 69 6E 67 20 74 6F 20 67 69 76 65 20 69 6E 20 74 6F 20 4B 69 E1 BB 83 6D 20 74 72 61 20 54 69 E1 BA BF 6E 67 20 56 69 E1 BB 87 74 0D 0A C3 A0 20 C3 80 20 C3 81 20 C3 A1 20 C3 A9 20 C3 AD 20 C3 B3 20 C3 BA 20 C3 A2 20 C3 A3

After looking at it some more, I am not sure how you could convert it. It looks like to make the character ể, it takes 3 hex value, being E1 BB 83.

I thought maybe there was a marker somewhere, but I don't see one. Convert it to UTF-16 and then you can manipulate it.

late,

Sul

Share this post


Link to post
Share on other sites
svkhtn

Hi sulfurious,

Thank you very much for your explanation.

By the way, if I just want to read it from a raw file (either UTF-8 or UTF-16) and use ControlSetText to set that unicode text to the notepad, can I display the unicode text correctly???? I just want to read and set the text (no need to change or manipulate that text).

I tried but it didn't show properly using ControlSetText.

Any idea please?

Share this post


Link to post
Share on other sites
svkhtn

I think the main reason is because AutoIt is not able to handle unicode texts properly at the moment.

As in another thread, I also asked something similar about unicode: http://www.autoitscript.com/forum/index.ph...03&hl=UTF-8

At the current version of AutoIt, I think you can only read the unicode from a RAW file, manipulate it based on binary string, and write it again to another file.

I tried to put the unicode text to the Clipboard, or use ControlSetText to set the text to notepad (which obviously can display unicode text properly) without any success so far :whistle:. I think somehow AutoIt messes up the unicode text before it is assigned to a clipboard/textbox/control etc.

I hope the next version of AutoIt will handle unicode better.

Share this post


Link to post
Share on other sites
xian7479

AutoIt cannot handle Unicode or UTF-8 by itself, but it can call system dlls which can handle Unicode or UTF-8 (the OS has to be Windows 2000 or later which supports Unicode). As long as you convert Unicode or UTF-8 text into ASCII before you process it in a nonbinary level, and convert it back before you write it in binary mode.

But obviously, I don't think you can directly put Unicode text into Clipboard or Notepad because AutoIt does not support Unicode in nonbinary level.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×