Sign in to follow this  
Followers 0
torels

Utf-16 (little endian) file to ANSI file

14 posts in this topic

Hi there.

I already made a search on UTF-16 on the forum and nothing important or useful came out

my problem is that I have a file encoded in UTF-16 little endian and autoit just doesn't read it... it stops at the 5 char.

Here is an example of the file:

mhbd¼   Ê[    %      f„`uAûÓ
 c¥-ÿ"@|š                         itGÇ  "3F‹                                                                             &ÿÿ 
                     mhsd`   J‰                                                                                    mhlt\   S                                                                                  mhitH  H     
      A4M    ôkµÂ    - ˆm        Ï  €     D¬            >
                          ÛkµÂ    ¶¦ü‹ ñi
      ÿÿ         D,G    3              ®Rà   ¶¦ü‹ ñi
       @   ø>         À                                                                                     ¥ü‹ ñi
    -     €€€€€€                                                                                                                                                                      
                                                                                                    mhod   F                           " H i g h w a y   B l u e s " mhod   ž                 v          M a r c   S e a l e s ,   c o m p o s e r .   N e w   S t o r i e s .   E r n i e   W a t t s ,   s a x o p h o n e . mhod   @                           S p e a k i n '   O u t mhod   0                           J a z z mhod   D                           F i l e   a u d i o   A A C mhod   h                 @          : i P o d _ C o n t r o l : M u s i c : F 2 4 : R X R U . m 4 a

To read the file with autoit I needed to Open the file in notepad and "Save as..." and change the Encoding to ANSI in the Save As Dialog

otherwise, both with FileRead() and With Fileopen() with the little endian flag and then fileread() what I get is just

mhbd¼

can anyone help ?

thanks in advance :)


Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

use FileOpen with the proper flag (32 = UTF16-LE) :)

Edited by ProgAndy

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Share this post


Link to post
Share on other sites

as I said... it only gives me the first 5 chars :)


Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org

Share this post


Link to post
Share on other sites

as I said... it only gives me the first 5 chars :)

Then there are nulls in the file, interpreted as EOF, and you'll have to work around that with binary mode.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

I also tried 16+32... with no result whatsoever :)

is it possible for some char to terminate the file reading ?

how am I supposed to work around the thing ?


Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

It doesn't matter what you see or don't (reason for 5 chars in your case explained by PsaltyDS).

The rest is there too. If you want to process the stings you can, regardless of not being able to write it to what deals with null-terminated strings.

Edited by trancexx

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

Wanted to make sure it worked as I thought, so here is a basic demo:

- Create a UTF-16 LE file of text

- Display unmodified file

- Use binary modes to insert a null in the file

- Display the modified file, demonstrating truncation

- Use binary modes to remove null(s) from the file

- Display the re-modified file, demonstrating all text is still there

Global $sUtf16File = @ScriptDir & "\Test1.txt", $hFile
Global $iChar = 16 ; No. of chars before null
Global $vRead, $vWrite

; Create basic file
$hFile = FileOpen($sUtf16File, 32 + 2) ; UTF16 LE, overwrite
For $n = 1 To 10
    FileWrite($hFile, "0123456789abcdefghijklmnopqrstuvwxyz" & @CRLF)
Next
FileClose($hFile)

; Test basic file
$hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read
$vRead = FileRead($sUtf16File)
FileClose($hFile)
ConsoleWrite("$vRead = " & $vRead & @LF)

; Insert null
$hFile = FileOpen($sUtf16File, 16) ; Binary, read
$vRead = FileRead($hFile)
FileClose($hFile)
$vWrite = BinaryMid($vRead, 1, $iChar * 2) ; 2 bytes per char (UTF16)
$vWrite &= Binary("0x0000") ; 2 byte null
$vWrite &= BinaryMid($vRead, ($iChar * 2) + 1)
$hFile = FileOpen($sUtf16File, 16 + 2) ; Binary, overwrite
FileWrite($hFile, $vWrite)
FileClose($hFile)

; Test modified file with null
$hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read
$vRead = FileRead($sUtf16File)
FileClose($hFile)
ConsoleWrite("$vRead = " & $vRead & @LF)
ConsoleWrite(@LF & @LF)

; Remove null
$hFile = FileOpen($sUtf16File, 16) ; Binary, read
$vRead = FileRead($hFile)
FileClose($hFile)
$vWrite = BinaryMid($vRead, 1, 2) ; Copy BOH
For $n = 3 To BinaryLen($vRead) Step 2
    If BinaryMid($vRead, $n, 2) = Binary("0x0000") Then
        ContinueLoop ; Skip null char
    Else
        $vWrite &= BinaryMid($vRead, $n, 2) ; Copy non-null char
    EndIf
Next
$hFile = FileOpen($sUtf16File, 16 + 2) ; Binary, overwrite
FileWrite($hFile, $vWrite)
FileClose($hFile)

; Test file with nulls removed
$hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read
$vRead = FileRead($sUtf16File)
FileClose($hFile)
ConsoleWrite("$vRead = " & $vRead & @LF)

You can change $iChar at the top to change where the null goes. Set $iChar = 5 and it should look just like your symptoms.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

thanks PsaltyDS. This is what I was looking for... but what if i've go to work with a ~ 2MB file ?

it takes ages to read a file of those dimensions :)

is there any other method ?

or shuold I stick to a script that opens the file in notepad and saves it with anoter name ?

thanks alot for the help :)

ps. I tried with some string replacements but with no positive results

Edited by torels

Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org

Share this post


Link to post
Share on other sites

thanks PsaltyDS. This is what I was looking for... but what if i've go to work with a ~ 2MB file ?

it takes ages to read a file of those dimensions :P

Well, the read doesn't take long, but the loop to remove nulls does take a while.

is there any other method ?

or shuold I stick to a script that opens the file in notepad and saves it with anoter name ?

thanks alot for the help :)

ps. I tried with some string replacements but with no positive results

Is there only the one null early in the file, or many nulls throughout the file? If it's only the one, the loop can write out the rest of the binary and exit once it's dealt with, so the rest of the loop doesn't have to run. That would save a lot of time.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

no, it's full of nulls in random positions :)

Pff :)


Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org

Share this post


Link to post
Share on other sites

You don't need a loop at all. I wrote that in my post above, only different wordings.

Anyway, to remove nulls (based on PsaltyDS's):

; Remove null
$hFile = FileOpen($sUtf16File, 32)
$vRead = FileRead($hFile)
FileClose($hFile)

$vWrite = StringReplace($vRead, Chr(0), "") ;<- This!

$hFile = FileOpen($sUtf16File, 32 + 2)
FileWrite($hFile, $vWrite)
FileClose($hFile)

; Test
$hFile = FileOpen($sUtf16File, 32)
$vRead = FileRead($sUtf16File)
FileClose($hFile)
ConsoleWrite("$vRead = " & $vRead & @LF)

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

You don't need a loop at all. I wrote that in my post above, only different wordings.

Anyway, to remove nulls (based on PsaltyDS's):

; Remove null
$hFile = FileOpen($sUtf16File, 32)
$vRead = FileRead($hFile)
FileClose($hFile)

$vWrite = StringReplace($vRead, Chr(0), "") ;<- This!

$hFile = FileOpen($sUtf16File, 32 + 2)
FileWrite($hFile, $vWrite)
FileClose($hFile)

; Test
$hFile = FileOpen($sUtf16File, 32)
$vRead = FileRead($sUtf16File)
FileClose($hFile)
ConsoleWrite("$vRead = " & $vRead & @LF)
I did think of that, and there was a discussion at some time in the past about StringReplace() being special in that it wouldn't null-terminate in processing the string. The problem I assumed (but didn't test) was that since this is UTF-16 that method would remove individual bytes of the two-byte characters where they happened to be 0. For example ANSI "A" is 0x40, and UTF-16 "A" would be 0x0040. If you remove Chr(0) it could change 0x0040 to 0x40 and scramble the data.

Need to test that, and even if it's a problem, does just changing the match to ChrW(0) fix it?

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

I did think of that, and there was a discussion at some time in the past about StringReplace() being special in that it wouldn't null-terminate in processing the string. The problem I assumed (but didn't test) was that since this is UTF-16 that method would remove individual bytes of the two-byte characters where they happened to be 0. For example ANSI "A" is 0x40, and UTF-16 "A" would be 0x0040. If you remove Chr(0) it could change 0x0040 to 0x40 and scramble the data.

Need to test that, and even if it's a problem, does just changing the match to ChrW(0) fix it?

:)

If I can do this:

$vWrite = StringReplace($vRead, "A", "")

then I don't see why would that "A" be 0x0040 and Chr(0) be 0x00.

"A" is as much ANSI as Chr(0).

Do you see my (actually their) point?


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

If I can do this:

$vWrite = StringReplace($vRead, "A", "")

then I don't see why would that "A" be 0x0040 and Chr(0) be 0x00.

"A" is as much ANSI as Chr(0).

Do you see my (actually their) point?

Yes, and it works! :)

My mistake was reading the file in binary mode, and then confusing that together with your (character-based) operation.

:)


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0