Cyrillic/Japanese encoding standards to UTF-8

Antiec · February 29, 2008

Hi there!

I have a little script which reads files in hex-mode, picks up some data and converts it to text with _HexToString(). It worked fine until I happened to bump into a file with the text with some special encoding like ISO-JIS or something (not UTF or ANSI). They're files which I can't edit or anything, so I'd need a way to re-encode the text. I also can read the file for the encoding information.

So, is it possible to convert some random encoding to UTF or ANSI?

Thx,

Lassi

rudi · February 29, 2008

Hello.

So, is it possible to convert some random encoding to UTF or ANSI?

Well, you will need to search for the correct translation tables. And you will need to know what encoding the current file is. Then it's basically a search/replace.

You can try to write a program analysing the relative frequency of chars in your files to guess the encoding...

Regards, Rudi.

Edited February 29, 2008 by rudi

Antiec · March 3, 2008

Hi,

I'm not so sure if I understood what you said. But I can read the file to get the current encoding, like "ISO_IR 13" which is "JIS_X0201" (yeah, I'm reading DICOM-files).

So do you mean that if I want to recode the file from JIS_X0201, I have to search some gibberish symbols and replace them with the responding UTF-8 -characters? That would mean that I had to search for thousands of characters . I hoped there was some easier way to do it. Maybe we just have to leave it be.

Thx, Lassi.

Sign In

Cyrillic/Japanese encoding standards to UTF-8

Recommended Posts

Antiec

rudi

Antiec

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta