torels Posted October 7, 2009 Share Posted October 7, 2009 Hi there. I already made a search on UTF-16 on the forum and nothing important or useful came out my problem is that I have a file encoded in UTF-16 little endian and autoit just doesn't read it... it stops at the 5 char. Here is an example of the file: mhbd¼ Ê[ % f„`uAûÓ c¥-ÿ"@|š itGÇ "3F‹ &ÿÿ mhsd` J‰ mhlt\ S mhitH H A4M ôkµÂ - ˆm Ï € D¬ > ÛkµÂ ¶¦ü‹ ñi ÿÿ D,G 3 ®Rà ¶¦ü‹ ñi @ ø> À ¥ü‹ ñi - €€€€€€ mhod F " H i g h w a y B l u e s " mhod ž v M a r c S e a l e s , c o m p o s e r . N e w S t o r i e s . E r n i e W a t t s , s a x o p h o n e . mhod @ S p e a k i n ' O u t mhod 0 J a z z mhod D F i l e a u d i o A A C mhod h @ : i P o d _ C o n t r o l : M u s i c : F 2 4 : R X R U . m 4 a To read the file with autoit I needed to Open the file in notepad and "Save as..." and change the Encoding to ANSI in the Save As Dialog otherwise, both with FileRead() and With Fileopen() with the little endian flag and then fileread() what I get is just mhbd¼ can anyone help ? thanks in advance Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org Link to comment Share on other sites More sharing options...
ProgAndy Posted October 7, 2009 Share Posted October 7, 2009 (edited) use FileOpen with the proper flag (32 = UTF16-LE) Edited October 7, 2009 by ProgAndy *GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes Link to comment Share on other sites More sharing options...
torels Posted October 7, 2009 Author Share Posted October 7, 2009 as I said... it only gives me the first 5 chars Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org Link to comment Share on other sites More sharing options...
PsaltyDS Posted October 7, 2009 Share Posted October 7, 2009 as I said... it only gives me the first 5 chars Then there are nulls in the file, interpreted as EOF, and you'll have to work around that with binary mode. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
torels Posted October 7, 2009 Author Share Posted October 7, 2009 I also tried 16+32... with no result whatsoever is it possible for some char to terminate the file reading ? how am I supposed to work around the thing ? Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org Link to comment Share on other sites More sharing options...
trancexx Posted October 7, 2009 Share Posted October 7, 2009 (edited) It doesn't matter what you see or don't (reason for 5 chars in your case explained by PsaltyDS). The rest is there too. If you want to process the stings you can, regardless of not being able to write it to what deals with null-terminated strings. Edited October 7, 2009 by trancexx ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
PsaltyDS Posted October 7, 2009 Share Posted October 7, 2009 Wanted to make sure it worked as I thought, so here is a basic demo: - Create a UTF-16 LE file of text - Display unmodified file - Use binary modes to insert a null in the file - Display the modified file, demonstrating truncation - Use binary modes to remove null(s) from the file - Display the re-modified file, demonstrating all text is still there expandcollapse popupGlobal $sUtf16File = @ScriptDir & "\Test1.txt", $hFile Global $iChar = 16 ; No. of chars before null Global $vRead, $vWrite ; Create basic file $hFile = FileOpen($sUtf16File, 32 + 2) ; UTF16 LE, overwrite For $n = 1 To 10 FileWrite($hFile, "0123456789abcdefghijklmnopqrstuvwxyz" & @CRLF) Next FileClose($hFile) ; Test basic file $hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read $vRead = FileRead($sUtf16File) FileClose($hFile) ConsoleWrite("$vRead = " & $vRead & @LF) ; Insert null $hFile = FileOpen($sUtf16File, 16) ; Binary, read $vRead = FileRead($hFile) FileClose($hFile) $vWrite = BinaryMid($vRead, 1, $iChar * 2) ; 2 bytes per char (UTF16) $vWrite &= Binary("0x0000") ; 2 byte null $vWrite &= BinaryMid($vRead, ($iChar * 2) + 1) $hFile = FileOpen($sUtf16File, 16 + 2) ; Binary, overwrite FileWrite($hFile, $vWrite) FileClose($hFile) ; Test modified file with null $hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read $vRead = FileRead($sUtf16File) FileClose($hFile) ConsoleWrite("$vRead = " & $vRead & @LF) ConsoleWrite(@LF & @LF) ; Remove null $hFile = FileOpen($sUtf16File, 16) ; Binary, read $vRead = FileRead($hFile) FileClose($hFile) $vWrite = BinaryMid($vRead, 1, 2) ; Copy BOH For $n = 3 To BinaryLen($vRead) Step 2 If BinaryMid($vRead, $n, 2) = Binary("0x0000") Then ContinueLoop ; Skip null char Else $vWrite &= BinaryMid($vRead, $n, 2) ; Copy non-null char EndIf Next $hFile = FileOpen($sUtf16File, 16 + 2) ; Binary, overwrite FileWrite($hFile, $vWrite) FileClose($hFile) ; Test file with nulls removed $hFile = FileOpen($sUtf16File, 32) ; UTF16 LE, read $vRead = FileRead($sUtf16File) FileClose($hFile) ConsoleWrite("$vRead = " & $vRead & @LF) You can change $iChar at the top to change where the null goes. Set $iChar = 5 and it should look just like your symptoms. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
torels Posted October 8, 2009 Author Share Posted October 8, 2009 (edited) thanks PsaltyDS. This is what I was looking for... but what if i've go to work with a ~ 2MB file ? it takes ages to read a file of those dimensions is there any other method ? or shuold I stick to a script that opens the file in notepad and saves it with anoter name ? thanks alot for the help ps. I tried with some string replacements but with no positive results Edited October 8, 2009 by torels Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org Link to comment Share on other sites More sharing options...
PsaltyDS Posted October 8, 2009 Share Posted October 8, 2009 thanks PsaltyDS. This is what I was looking for... but what if i've go to work with a ~ 2MB file ?it takes ages to read a file of those dimensions Well, the read doesn't take long, but the loop to remove nulls does take a while.is there any other method ?or shuold I stick to a script that opens the file in notepad and saves it with anoter name ?thanks alot for the help ps. I tried with some string replacements but with no positive resultsIs there only the one null early in the file, or many nulls throughout the file? If it's only the one, the loop can write out the rest of the binary and exit once it's dealt with, so the rest of the loop doesn't have to run. That would save a lot of time. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
torels Posted October 8, 2009 Author Share Posted October 8, 2009 no, it's full of nulls in random positions Pff Some Projects:[list][*]ZIP UDF using no external files[*]iPod Music Transfer [*]iTunes UDF - fully integrate iTunes with au3[*]iTunes info (taskbar player hover)[*]Instant Run - run scripts without saving them before :)[*]Get Tube - YouTube Downloader[*]Lyric Finder 2 - Find Lyrics to any of your song[*]DeskBox - A Desktop Extension Tool[/list]indifference will ruin the world, but in the end... WHO CARES :P---------------http://torels.altervista.org Link to comment Share on other sites More sharing options...
trancexx Posted October 9, 2009 Share Posted October 9, 2009 You don't need a loop at all. I wrote that in my post above, only different wordings. Anyway, to remove nulls (based on PsaltyDS's): ; Remove null $hFile = FileOpen($sUtf16File, 32) $vRead = FileRead($hFile) FileClose($hFile) $vWrite = StringReplace($vRead, Chr(0), "") ;<- This! $hFile = FileOpen($sUtf16File, 32 + 2) FileWrite($hFile, $vWrite) FileClose($hFile) ; Test $hFile = FileOpen($sUtf16File, 32) $vRead = FileRead($sUtf16File) FileClose($hFile) ConsoleWrite("$vRead = " & $vRead & @LF) ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
PsaltyDS Posted October 9, 2009 Share Posted October 9, 2009 You don't need a loop at all. I wrote that in my post above, only different wordings. Anyway, to remove nulls (based on PsaltyDS's):; Remove null $hFile = FileOpen($sUtf16File, 32) $vRead = FileRead($hFile) FileClose($hFile) $vWrite = StringReplace($vRead, Chr(0), "") ;<- This! $hFile = FileOpen($sUtf16File, 32 + 2) FileWrite($hFile, $vWrite) FileClose($hFile) ; Test $hFile = FileOpen($sUtf16File, 32) $vRead = FileRead($sUtf16File) FileClose($hFile) ConsoleWrite("$vRead = " & $vRead & @LF) I did think of that, and there was a discussion at some time in the past about StringReplace() being special in that it wouldn't null-terminate in processing the string. The problem I assumed (but didn't test) was that since this is UTF-16 that method would remove individual bytes of the two-byte characters where they happened to be 0. For example ANSI "A" is 0x40, and UTF-16 "A" would be 0x0040. If you remove Chr(0) it could change 0x0040 to 0x40 and scramble the data. Need to test that, and even if it's a problem, does just changing the match to ChrW(0) fix it? Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
trancexx Posted October 10, 2009 Share Posted October 10, 2009 I did think of that, and there was a discussion at some time in the past about StringReplace() being special in that it wouldn't null-terminate in processing the string. The problem I assumed (but didn't test) was that since this is UTF-16 that method would remove individual bytes of the two-byte characters where they happened to be 0. For example ANSI "A" is 0x40, and UTF-16 "A" would be 0x0040. If you remove Chr(0) it could change 0x0040 to 0x40 and scramble the data. Need to test that, and even if it's a problem, does just changing the match to ChrW(0) fix it? If I can do this: $vWrite = StringReplace($vRead, "A", "") then I don't see why would that "A" be 0x0040 and Chr(0) be 0x00. "A" is as much ANSI as Chr(0). Do you see my (actually their) point? ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
PsaltyDS Posted October 11, 2009 Share Posted October 11, 2009 If I can do this: $vWrite = StringReplace($vRead, "A", "") then I don't see why would that "A" be 0x0040 and Chr(0) be 0x00. "A" is as much ANSI as Chr(0). Do you see my (actually their) point? Yes, and it works! My mistake was reading the file in binary mode, and then confusing that together with your (character-based) operation. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now