sulfurious Posted September 2, 2006 Share Posted September 2, 2006 (edited) I have been into lot's of stuff concerning Unicode conversion lately, mostly for Regedit to INF file stuff. So, here is one function I wrote. A lot of it I had to figure out, and a lot came from ideas from this forum. Simply pass any Unicode file as the source, set the output file. This example shows how to pass the data to the function and write it to an Ascii file. Normally using Type would convert any Unicode (that is, above character 255 in decimal, or FF in hex) characters to unexpected results. With this method, you still cannot see the correct character in Ascii, BUT it does keep both bits of the hex, so you could convert it. Don't get confused, the Unicode data is written XX,XX which is 16bit for ONE character. Ascii data is written as XX, which is 8bit for ONE character. When I say the Unicode data is kept, I mean both 8bit parts to make the one are written. Well, here is the code. Comments welcome expandcollapse popup#include <string.au3> #include <file.au3> $PrepFile = "file path \ file name" ; source Unicode file Local $PrepFile1 = _TempFile() Local $TempFile = FileOpen($PrepFile,4) $read = StringReplace(String(FileRead($TempFile,FileGetSize($PrepFile))),"0x","") $read = StringTrimLeft($read,4) FileWrite($PrepFile1,$read) ; strip Unicode file of FFFE & 0x ;output to REGEDIT4 (Ascii) $HexFileRead = FileRead($PrepFile1) ; open temp file which has hex values $cal = _HexParse($HexFileRead) $filout = FileOpen("file path \ file name",2) ; declare save file location FileWrite($filout,$cal) Func _HexParse($Hxf) If StringMid($Hxf,1,4) = "FFFE" Then $Hxf = StringTrimLeft($Hxf,4) Local $HexTemp = $Hxf Local $HexReturn,$xCR,$HXa,$xCR,$HXi,$HexLine $HXa = 1 Do ; step through hex file, converting to Ascii or leaving as 16bit (which COULD be converted later for Unicode) If StringMid($HexTemp,$HXa,4) = "0d00" Then ; find carriage return If StringMid($HexTemp,$HXa,8) = "0d000a00" Then ; find both carriage return & line feed $xCR = $HXa + 7 ; this is start position of next hex value after crlf $HexLine = StringLeft($HexTemp,$xCR) ; the entire line of hex values, including the crlf $HexTemp = StringTrimLeft($HexTemp,$xCR) ; the remainder of the hex values For $HXi = 1 to StringLen($HexLine) Step 2 ; step through hex line, concatenate depending on decimal value If Dec(StringMid($HexLine,$HXi,2)) < 255 Then If StringMid($HexLine,$HXi,2) <> "00" Then $HexReturn&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; convert to text EndIf Else $HexReturn&=StringMid($HexLine,$HXi,2) ; unknown what happens here EndIf Next $HXa = 1 ; reset the variable for next cycle through remaining hex values Else $HXa = $HXa + 2 ; value not found, mark for next hex value to check EndIf Else $HXa = $HXa + 2 ; value not found, mark for next hex value to check EndIf Until StringLen($HexTemp) < 1 ; EOF Return $HexReturn ; one line of Ascii/Hex EndFunc later, Sul Edit: My mistake. This function returns chr 0 - 255. Any Unicode characters display themselves as thier character counterparts. At least they don't Auto Convert to something you can't work with, like the Type method does. Edited September 2, 2006 by sulfurious Link to comment Share on other sites More sharing options...
sulfurious Posted September 2, 2006 Author Share Posted September 2, 2006 (edited) Here is a new version. This one lets you read a Unicode file and choose which form to output to: Unicode or Ascii. You can even manipulate strings from Unicode and pass them back as Unicode, using Ascii terms for the string manipulation logic. expandcollapse popup#include <string.au3> #include <file.au3> Global $UniOut $reg5style = True ; set parameter for which output $PrepFile = "c:\epfull.reg" ; source Unicode file Local $TempFile=FileOpen($PrepFile,4) $read = StringReplace(String(FileRead($TempFile,FileGetSize($PrepFile))),"0x","") $read = StringTrimLeft($read,4) ; depending on output parameter, start processing Unicode file If $reg5style = True Then $UniOut = FileOpen("c:\epfull.txt",2) ; declare file out location _UFWL($UniOut,"FFFE") _HexParse($read) Else $cal = _HexParse($read) $AsciiOut = FileOpen("c:\epfull.txt",2) ; declare file out location FileWrite($AsciiOut,$cal) EndIf Func _HexParse($Hxf) If StringMid($Hxf,1,4) = "FFFE" Then $Hxf = StringTrimLeft($Hxf,4) Local $HexTemp = $Hxf Local $HexReturn,$xCR,$HXa,$xCR,$HXi,$HexLine,$ExtChrSet,$HexCont,$HexCVal,$HexParsed $ExtChrSet = False $HXa = 1 Do If StringMid($HexTemp,$HXa,4) = "0d00" Then ; find carriage return If StringMid($HexTemp,$HXa,8) = "0d000a00" Then ; find both carriage return & line feed $xCR = $HXa + 7 ; this is start position of next line after crlf $HexLine = StringLeft($HexTemp,$xCR) ; the entire line of hex values, including clrf $HexTemp = StringTrimLeft($HexTemp,$xCR) ; the remaining hex values ; FIRST PORTION - EITHER PASS LINE, OR CONCATENATE UNICODE PIECES TOGETHER If $reg5style = True Then ; proceed with strip \crlf & save multi hex lines as single long line If $HexCont = True Then ; stripping in progress, concat this line to $HexCVal If StringRight($HexLine,12) = "5C000D000A00" Then ; for my use, find \crlf (reg concatenation) $HexCont = True ; still equals true, next line exists If StringInStr($HexLine,"2000") Then $HexLine = StringReplace($HexLine,"2000","") ; strip "space" $HexCVal&=StringTrimRight($HexLine,12) ; concat and set value of stripped line $HXa = 1 ContinueLoop ; go to start of Do Else $HexCont = False ; this is last line If StringInStr($HexLine,"2000") Then $HexLine = StringReplace($HexLine,"2000","") ; strip "space" $HexCVal = $HexCVal & $HexLine EndIf Else ; stripping NOT in progress - examine line to see if stripping should begin If StringRight($HexLine,12) = "5C000D000A00" Then $HexCont = True ; stripping should begin $HexCVal&=StringTrimRight($HexLine,12) ; set value of stripped hex line $Hxa = 1 ContinueLoop ; go to start of Do Else $HexCVal = $HexLine ; stripping is in progress, concatenate here EndIf EndIf EndIf $ExtChrSet = False ; set value each loop If $HexCVal <> "" Then $HexLine = $HexCVal ; stripped & concated hex line exists, change value of $HexLine If $reg5style = True Then ; Unicode handling For $HXi = 1 to StringLen($HexLine) Step 4 ; stepping over each 16bit hex If StringMid($HexLine,$HXi+2,2) <> "00" Then ; there is Unicode present in this line $ExtChrSet = True EndIf Next If $ExtChrSet = False Then ; can use Ascii characters to manipulate this data For $HXi = 1 to StringLen($HexLine) Step 4 ; stepping over 16bit hex If Dec(StringMid($HexLine,$HXi,2)) < 255 Then ; eximine each 8bit piece $HexParsed&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; convert to Ascii EndIf $NewVal = _PassParse($HexParsed) ; pass to Ascii handler (manipulate here) Next EndIf Else ; Ascii only handling For $HXi = 1 to StringLen($HexLine) Step 2 ; outputs to Ascii, >255 chr's written as hex? If Dec(StringMid($HexLine,$HXi,2)) < 255 Then ; standard & extended Ascii character set If StringMid($HexLine,$HXi,2) <> "00" Then ; first 8 bits checked $HexReturn&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; place character (concatenated) EndIf Else $HexReturn&=StringMid($HexLine,$HXi,2) ; unknown what happens here EndIf Next EndIf If $reg5style = True Then ; write as raw data If $ExtChrSet = False Then For $HXi = 1 to StringLen($NewVal) _UFWL($UniOut,Hex(Asc(StringMid($NewVal,$HXi,1)),2) & "00") Next Else _UFWL($UniOut,$HexLine) EndIf EndIf $HexParsed = "" ; clear value before next loop $HexCVal = "" ; clear value before next loop $HexCont = False ; clear value before next loop $HXa = 1 ; reset variable for next loop Else $HXa = $HXa + 2 ; add for next loop EndIf Else $HXa = $HXa + 2 ; add for next loop EndIf Until StringLen($HexTemp) < 1 ; EOF Return $HexReturn ; if output to Ascii, the line to write including crlf characters EndFunc Func _PassParse($HxV) If StringInStr($HxV,"=") Then MsgBox(0,"do something","like change the text, and pass it back") ; comment If StringInStr($HxV,"=") Then $HxV = StringReplace($HxV,"=","***ZZZ***") ; some string manipulation Return $HxV ; pass handled string back to be written EndFunc Func _UFWL($File,$Hexxed) ; Unicode File Write Line (this insert @crlf that was stripped) If IsInt(StringLen($Hexxed) / 2) = 0 Then Return (-1) ; check for proper hex format Local $HXs For $HXs =1 To StringLen($Hexxed) Step 2 ; step over each 8bit hex FileWrite($File,Chr(Dec(StringMid($Hexxed,$HXs,2)))) ; write raw data (Unicode) Next EndFunc later, Sul Edit: oops. Edited September 2, 2006 by sulfurious Link to comment Share on other sites More sharing options...
sulfurious Posted September 3, 2006 Author Share Posted September 3, 2006 Note on the whole passing converted Unicode strings to Ascii handling and back to Unicode. The function I built will actually leave the crlf on each line. It prints stuff out to Unicode file fine. However, I have found stripping a portion that includes the EOF fails when using the Dec() function on the stripped portion. I am unsure yet if it is repeatable or an anomoly determined by the data. The fix was simple enough thouh. You will need to use _StringToHex() and look for 8bit values 0d0a and trim them, then use _HexToString() to go back to Ascii and continue the handling. Or a similar stripping method. And after you have manipulated your line, you need to concat @crlf on the end. The FileWrite method used to write in RAW, needs that. Pure 16bit handling is only possible, AFAIK, by stepping through the single hex line, looking for your marker character, (ie an = character, or in hex 3d00) and then setting a value that marks the line position. Now you can strip the hex on either side and use some more logic to change it. later, Sul Link to comment Share on other sites More sharing options...
blademonkey Posted September 9, 2006 Share Posted September 9, 2006 (edited) I have been into lot's of stuff concerning Unicode conversion lately, mostly for Regedit to INF file stuff. So, here is one function I wrote. A lot of it I had to figure out, and a lot came from ideas from this forum. Simply pass any Unicode file as the source, set the output file. I'm writing a similar function but it's not as intricate as yours. instead of doing the unicode conversion, i found that you can do a type redirection and the result will be in Ascii. here's a snippet: $file1 = "anyfile.reg" $file2 = "anyfile.ini" runwait(@comspec & ' /c ' & 'type ' & $file1 & ' > ' & $file2,"", @sw_hide) so the result is an ascii INI which i parse using iniread functions. it's not done yet, im still figuring out how to handle if type of registry entry type. I will post it here when i am done (or close to done) if you or anyone else is interested. -Blademonkey Edited September 9, 2006 by blademonkey ---"Educate the Mind, Make Savage the Body" -Mao Tse Tung Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now