Sign in to follow this  
Followers 0
turbox

Ansi to unicode

18 posts in this topic

Does somebody know any script or how to do a script which convert a file from ansi to unicode?

Share this post


Link to post
Share on other sites



you can easily convert your script ansi to unicode using AutoIt Compiler.

it can be found at

C:\Program Files\AutoIt3\Aut2Exe

Aut2Exe.exe is for unicode and Aut2ExeA.exe is for ansi...


[font="Georgia"]GSM Expert[/font] but not AutoIt :DProud to be Admin Of : http://www.gsmhosting.net/visit my Forum... http://www.gsmhosting.net/vbb/index.php
$Life = "Happy"
If @Error Then
$Life = "Risk"

Share this post


Link to post
Share on other sites

i want to convert a file to unicode

Share this post


Link to post
Share on other sites

Maybe udf function _WinAPI_MultiByteToWideChar

Share this post


Link to post
Share on other sites

Actually is a dat file

i tried fileopen(x, 32) but doesn't work. It reads only the half

but if i save the file as unicode the it reads it all

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

If you are trying to convert an Ansi file to Unicode, then you wouldn't open it in Unicode, you'd open it with 0 or just FileRead().

http://www.autoitscript.com/forum/index.ph...ic=21815&hl may help you.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Try this:

$FILE_OPEN = FileOpen(@ScriptDir & "\SETTINGS.TXT",0)
$DATA = FileRead($FILE_OPEN)
FileClose($FILE_OPEN)
$FILE_WRITE = FileOpen(@ScriptDir & "\Unicode_settings.txt",32+2)
FileWrite($FILE_WRITE,$DATA)
FileClose($FILE_WRITE)

When the words fail... music speaks

Share this post


Link to post
Share on other sites

it reads untill Ύ and then stops. only when i save it as unicode it can be readen

Share this post


Link to post
Share on other sites

it reads untill Ύ and then stops. only when i save it as unicode it can be readen

If there are any null characters in the ANSI version, then you will have to read it in binary, remove or replace the nulls, then you can do BinaryToString() and save it in any format you want.

Here is a similar situation where nulls had to be removed: Demo

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Hi.

I've read up your Demo, thanks.

I'm just wondering: Why are Chr(0) possible/allowed in ANSI, but not in Unicode? Or, asked differently: What's the representative of Chr(0) in Unicode?

And: By replacing Chr(0) with "<null>", isn't basically the content of the file forged unduly? (I don't know, I just ask)

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites

Hi.

I've read up your Demo, thanks.

I'm just wondering: Why are Chr(0) possible/allowed in ANSI, but not in Unicode? Or, asked differently: What's the representative of Chr(0) in Unicode?

And: By replacing Chr(0) with "<null>", isn't basically the content of the file forged unduly? (I don't know, I just ask)

Regards, Rudi.

It has nothing to do with being "allowed" in ANSI. Chr(0) indicates EOF (End Of File) to AutoIt when encountered in a string, so it stops processing the string (or file) at that point. Reading the file in Binary avoids that issue to get the whole file read into memory, so that you can do something with the nulls before continuing string processing. Demo:
$sStart = ":Start:"
$sEnd = ":End:"
$sString = $sStart & Chr(0) & $sEnd
ConsoleWrite("Length = " & StringLen($sString) & @LF); 13 characters long
ConsoleWrite("$sString = " & $sString & @LF); String seems to end at the null
ConsoleWrite(@LF); Extra LF because the previous one get cut off by the null

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Just to iterate what salty is saying:

Local $s_string = "I am a " & Chr(0) & "string with " & Chr(0) & "nulls"
MsgBox(64, "Info", "$s_string = " & $s_string)

Local $s_binary = StringToBinary($s_string)
Local $s_strip_nulls = StringRegExpReplace($s_binary, "(00)|(.{2})", "\2")
Local $s_convert_non_null = BinaryToString($s_strip_nulls)
MsgBox(64, "Info", _
    "$s_binary = " & $s_binary & @CRLF & @CRLF & _
    "$s_strip_nulls = " & $s_strip_nulls & @CRLF & @CRLF & _
    "$s_convert_non_null = " & $s_convert_non_null)


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

Hi.

1.) OK, I can see, that Autoit handles even longer strings as "NULL-terminated".

2.) When a given ANSI file includes Chr(0) characters, and it shall be transformed to Unicode, I would have expected, that also these Chr(0) would have to be "transformed" to Unicode ;) Stripping them out of the file will alter the file's content, isn't it?

3.) I don't really get the regex:

Just to iterate what salty is saying:

[snip]
StringRegExpReplace($s_binary, "(00)|(.{2})", "\2")
I can see, that it works ;) so let me try to understand it:

"(00)" seems to represent Chr(0). Due to the help file I thought that the syntax should be "\x##", that would come to "\x00"?

"|" means or. (OK)

"(.{2})" I don't get that one: "." = any character, {2} = repeated exactly 2 times?

And why "\2", = backref the #2 match, isn't it?

Honestly, I loose you here :D

Regards, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites

Hi.

1.) OK, I can see, that Autoit handles even longer strings as "NULL-terminated".

2.) When a given ANSI file includes Chr(0) characters, and it shall be transformed to Unicode, I would have expected, that also these Chr(0) would have to be "transformed" to Unicode :D Stripping them out of the file will alter the file's content, isn't it?

Yes, if you wanted to preserve file formatting with nulls, you would have to substitute a marker for the nulls, convert to Unicode, then put the nulls back. This would actually be easier with StringSplit($sString, Chr(0)). You just put the null back when you reassemble the string from the array with _ArrayToString().

3.) I don't really get the regex:

I can see, that it works :lmao: so let me try to understand it:

"(00)" seems to represent Chr(0). Due to the help file I thought that the syntax should be "\x##", that would come to "\x00"?

"|" means or. (OK)

"(.{2})" I don't get that one: "." = any character, {2} = repeated exactly 2 times?

And why "\2", = backref the #2 match, isn't it?

Honestly, I loose you here ;)

Regards, Rudi.

The effect is working on hex digits rather than characters. The "(00)|(.{2})" means match 00 or any two hex numbers (one byte).

If it matches 00 then the back reference to the match is "\1", and "\2" is nothing because the rest of the options don't get evaluated once there is a match. Therefore, 00 gets replaced with nothing.

In the case where there is something there (i.e. 58 for 'X') the first part doesn't match, but the second part does because .{2} means any two digits. So "\1" is nothing and "\2" is 58, and 58 gets replaced with 58.

That last may seem like a waste of time, but consider the string 'P' & @LF, which is 500A. If you don't make sure everything is handled two digits at a time, the 00 in the middle gets removed and you wind up with 5A ('Z').

SmOke_N is such a geek...

;)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

[snip]

That last may seem like a waste of time, but consider the string 'P' & @LF, which is 500A. If you don't make sure everything is handled two digits at a time, the 00 in the middle gets removed and you wind up with 5A ('Z').

Back from some days of :lmao: holidays :cheer: (France, from Colmar down to the Côte d'Azur) I find your reply: Really well explained ;) , thanks.

BTW: Even though I prefer freeware, if available, I've spent some bucks for RegExpBuddy (It's just a pitty, that such a genious tool seems not to be available for free -- up to now ;) )

I't makes it very easy to understand RegEx examples found here and in other places. I'm enjoying several, even complex examples with ease right now :D

Regards and tx again, Rudi.


Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0