Sign in to follow this  
Followers 0
Antiec

Cyrillic/Japanese encoding standards to UTF-8

3 posts in this topic

Hi there!

I have a little script which reads files in hex-mode, picks up some data and converts it to text with _HexToString(). It worked fine until I happened to bump into a file with the text with some special encoding like ISO-JIS or something (not UTF or ANSI). They're files which I can't edit or anything, so I'd need a way to re-encode the text. I also can read the file for the encoding information.

So, is it possible to convert some random encoding to UTF or ANSI?

Thx,

Lassi

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Hello.

So, is it possible to convert some random encoding to UTF or ANSI?

Well, you will need to search for the correct translation tables. And you will need to know what encoding the current file is. Then it's basically a search/replace.

You can try to write a program analysing the relative frequency of chars in your files to guess the encoding...

Regards, Rudi.

Edited by rudi

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites

Hi,

I'm not so sure if I understood what you said. But I can read the file to get the current encoding, like "ISO_IR 13" which is "JIS_X0201" (yeah, I'm reading DICOM-files).

So do you mean that if I want to recode the file from JIS_X0201, I have to search some gibberish symbols and replace them with the responding UTF-8 -characters? That would mean that I had to search for thousands of characters :). I hoped there was some easier way to do it. Maybe we just have to leave it be.

Thx, Lassi.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0