Jump to content

Unicode to Ascii and/or Hex Function


Recommended Posts

I have been into lot's of stuff concerning Unicode conversion lately, mostly for Regedit to INF file stuff. So, here is one function I wrote. A lot of it I had to figure out, and a lot came from ideas from this forum. Simply pass any Unicode file as the source, set the output file.

This example shows how to pass the data to the function and write it to an Ascii file. Normally using Type would convert any Unicode (that is, above character 255 in decimal, or FF in hex) characters to unexpected results. With this method, you still cannot see the correct character in Ascii, BUT it does keep both bits of the hex, so you could convert it. Don't get confused, the Unicode data is written XX,XX which is 16bit for ONE character. Ascii data is written as XX, which is 8bit for ONE character. When I say the Unicode data is kept, I mean both 8bit parts to make the one are written.

Well, here is the code. Comments welcome

#include <string.au3>
#include <file.au3>

$PrepFile = "file path \ file name" ; source Unicode file
Local $PrepFile1 = _TempFile()
Local $TempFile = FileOpen($PrepFile,4)

$read = StringReplace(String(FileRead($TempFile,FileGetSize($PrepFile))),"0x","")
$read = StringTrimLeft($read,4)
FileWrite($PrepFile1,$read) ; strip Unicode file of FFFE & 0x

;output to REGEDIT4 (Ascii)
$HexFileRead = FileRead($PrepFile1) ; open temp file which has hex values
$cal = _HexParse($HexFileRead)
$filout = FileOpen("file path \ file name",2) ; declare save file location
FileWrite($filout,$cal)

Func _HexParse($Hxf)
    If StringMid($Hxf,1,4) = "FFFE" Then $Hxf = StringTrimLeft($Hxf,4)
    Local $HexTemp = $Hxf
    Local $HexReturn,$xCR,$HXa,$xCR,$HXi,$HexLine
    $HXa = 1
    Do ; step through hex file, converting to Ascii or leaving as 16bit (which COULD be converted later for Unicode)
        If StringMid($HexTemp,$HXa,4) = "0d00" Then ; find carriage return
            If StringMid($HexTemp,$HXa,8) = "0d000a00" Then ; find both carriage return & line feed
                $xCR = $HXa + 7 ; this is start position of next hex value after crlf
                $HexLine = StringLeft($HexTemp,$xCR) ; the entire line of hex values, including the crlf
                $HexTemp = StringTrimLeft($HexTemp,$xCR) ; the remainder of the hex values
                For $HXi = 1 to StringLen($HexLine) Step 2 ; step through hex line, concatenate depending on decimal value
                    If Dec(StringMid($HexLine,$HXi,2)) < 255 Then
                        If StringMid($HexLine,$HXi,2) <> "00" Then
                            $HexReturn&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; convert to text
                        EndIf
                    Else
                        $HexReturn&=StringMid($HexLine,$HXi,2) ; unknown what happens here
                    EndIf
                Next
                $HXa = 1 ; reset the variable for next cycle through remaining hex values
            Else
                $HXa = $HXa + 2 ; value not found, mark for next hex value to check 
            EndIf
        Else
            $HXa = $HXa + 2 ; value not found, mark for next hex value to check
        EndIf
    Until StringLen($HexTemp) < 1 ; EOF
    Return $HexReturn ; one line of Ascii/Hex
EndFunc

later,

Sul

Edit: My mistake. This function returns chr 0 - 255. Any Unicode characters display themselves as thier character counterparts. At least they don't Auto Convert to something you can't work with, like the Type method does.

Edited by sulfurious
Link to comment
Share on other sites

Here is a new version. This one lets you read a Unicode file and choose which form to output to: Unicode or Ascii. You can even manipulate strings from Unicode and pass them back as Unicode, using Ascii terms for the string manipulation logic.

#include <string.au3>
#include <file.au3>
Global $UniOut

$reg5style = True ; set parameter for which output

$PrepFile = "c:\epfull.reg" ; source Unicode file

Local $TempFile=FileOpen($PrepFile,4)
$read = StringReplace(String(FileRead($TempFile,FileGetSize($PrepFile))),"0x","")
$read = StringTrimLeft($read,4)

; depending on output parameter, start processing Unicode file
If $reg5style = True Then
    $UniOut = FileOpen("c:\epfull.txt",2) ; declare file out location
    _UFWL($UniOut,"FFFE")
    _HexParse($read)
Else
    $cal = _HexParse($read)
    $AsciiOut = FileOpen("c:\epfull.txt",2) ; declare file out location
    FileWrite($AsciiOut,$cal)
EndIf

Func _HexParse($Hxf)
    If StringMid($Hxf,1,4) = "FFFE" Then $Hxf = StringTrimLeft($Hxf,4)
    Local $HexTemp = $Hxf
    Local $HexReturn,$xCR,$HXa,$xCR,$HXi,$HexLine,$ExtChrSet,$HexCont,$HexCVal,$HexParsed
    $ExtChrSet = False
    $HXa = 1
    Do 
        If StringMid($HexTemp,$HXa,4) = "0d00" Then ; find carriage return
            If StringMid($HexTemp,$HXa,8) = "0d000a00" Then ; find both carriage return & line feed
                $xCR = $HXa + 7 ; this is start position of next line after crlf
                $HexLine = StringLeft($HexTemp,$xCR) ; the entire line of hex values, including clrf
                $HexTemp = StringTrimLeft($HexTemp,$xCR) ; the remaining hex values
                ; FIRST PORTION - EITHER PASS LINE, OR CONCATENATE UNICODE PIECES TOGETHER
                If $reg5style = True Then ; proceed with strip \crlf & save multi hex lines as single long line
                    If $HexCont = True Then ; stripping in progress, concat this line to $HexCVal
                        If StringRight($HexLine,12) = "5C000D000A00" Then ; for my use, find \crlf (reg concatenation)
                            $HexCont = True ; still equals true, next line exists
                            If StringInStr($HexLine,"2000") Then $HexLine = StringReplace($HexLine,"2000","") ; strip "space"
                            $HexCVal&=StringTrimRight($HexLine,12) ; concat and set value of stripped line
                            $HXa = 1
                            ContinueLoop ; go to start of Do
                        Else    
                            $HexCont = False ; this is last line
                            If StringInStr($HexLine,"2000") Then $HexLine = StringReplace($HexLine,"2000","") ; strip "space"
                            $HexCVal = $HexCVal & $HexLine
                        EndIf
                    Else ; stripping NOT in progress - examine line to see if stripping should begin
                        If StringRight($HexLine,12) = "5C000D000A00" Then 
                            $HexCont = True ; stripping should begin
                            $HexCVal&=StringTrimRight($HexLine,12) ; set value of stripped hex line
                            $Hxa = 1
                            ContinueLoop ; go to start of Do
                        Else
                            $HexCVal = $HexLine ; stripping is in progress, concatenate here
                        EndIf
                    EndIf
                EndIf   
                $ExtChrSet = False ; set value each loop
                If $HexCVal <> "" Then $HexLine = $HexCVal ; stripped & concated hex line exists, change value of $HexLine
                If $reg5style = True Then ; Unicode handling
                    For $HXi = 1 to StringLen($HexLine) Step 4 ; stepping over each 16bit hex
                        If StringMid($HexLine,$HXi+2,2) <> "00" Then ; there is Unicode present in this line
                        $ExtChrSet = True
                        EndIf                   
                    Next
                    If $ExtChrSet = False Then ; can use Ascii characters to manipulate this data
                        For $HXi = 1 to StringLen($HexLine) Step 4 ; stepping over 16bit hex
                            If Dec(StringMid($HexLine,$HXi,2)) < 255 Then ; eximine each 8bit piece
                                $HexParsed&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; convert to Ascii
                            EndIf
                            $NewVal = _PassParse($HexParsed) ; pass to Ascii handler (manipulate here)
                        Next
                    EndIf
                Else ; Ascii only handling
                    For $HXi = 1 to StringLen($HexLine) Step 2 ; outputs to Ascii, >255 chr's written as hex?
                        If Dec(StringMid($HexLine,$HXi,2)) < 255 Then ; standard & extended Ascii character set
                            If StringMid($HexLine,$HXi,2) <> "00" Then ; first 8 bits checked
                                $HexReturn&=Chr(Dec(StringMid($HexLine,$HXi,2))) ; place character (concatenated)
                            EndIf
                        Else
                            $HexReturn&=StringMid($HexLine,$HXi,2) ; unknown what happens here
                        EndIf
                    Next
                EndIf
                If $reg5style = True Then ; write as raw data
                    If $ExtChrSet = False Then
                        For $HXi = 1 to StringLen($NewVal)
                            _UFWL($UniOut,Hex(Asc(StringMid($NewVal,$HXi,1)),2) & "00")
                        Next
                    Else
                        _UFWL($UniOut,$HexLine)
                    EndIf
                EndIf
                $HexParsed = "" ; clear value before next loop
                $HexCVal = "" ; clear value before next loop
                $HexCont = False ; clear value before next loop
                $HXa = 1 ; reset variable for next loop
            Else
                $HXa = $HXa + 2 ; add for next loop
            EndIf
        Else
            $HXa = $HXa + 2 ; add for next loop
        EndIf
    Until StringLen($HexTemp) < 1 ; EOF
    Return $HexReturn ; if output to Ascii, the line to write including crlf characters
EndFunc

Func _PassParse($HxV)
    If StringInStr($HxV,"=") Then MsgBox(0,"do something","like change the text, and pass it back") ; comment
    If StringInStr($HxV,"=") Then $HxV = StringReplace($HxV,"=","***ZZZ***") ; some string manipulation
    Return $HxV ; pass handled string back to be written
EndFunc

Func _UFWL($File,$Hexxed) ; Unicode File Write Line (this insert @crlf that was stripped)
    If IsInt(StringLen($Hexxed) / 2) = 0 Then Return (-1) ; check for proper hex format
    Local $HXs
    For $HXs =1  To StringLen($Hexxed) Step 2 ; step over each 8bit hex
        FileWrite($File,Chr(Dec(StringMid($Hexxed,$HXs,2)))) ; write raw data (Unicode)
    Next
EndFunc

later,

Sul

Edit: oops.

Edited by sulfurious
Link to comment
Share on other sites

Note on the whole passing converted Unicode strings to Ascii handling and back to Unicode. The function I built will actually leave the crlf on each line. It prints stuff out to Unicode file fine. However, I have found stripping a portion that includes the EOF fails when using the Dec() function on the stripped portion. I am unsure yet if it is repeatable or an anomoly determined by the data. The fix was simple enough thouh. You will need to use _StringToHex() and look for 8bit values 0d0a and trim them, then use _HexToString() to go back to Ascii and continue the handling. Or a similar stripping method. And after you have manipulated your line, you need to concat @crlf on the end. The FileWrite method used to write in RAW, needs that.

Pure 16bit handling is only possible, AFAIK, by stepping through the single hex line, looking for your marker character, (ie an = character, or in hex 3d00) and then setting a value that marks the line position. Now you can strip the hex on either side and use some more logic to change it.

later,

Sul

Link to comment
Share on other sites

I have been into lot's of stuff concerning Unicode conversion lately, mostly for Regedit to INF file stuff. So, here is one function I wrote. A lot of it I had to figure out, and a lot came from ideas from this forum. Simply pass any Unicode file as the source, set the output file.

I'm writing a similar function but it's not as intricate as yours.

instead of doing the unicode conversion, i found that you can do a type redirection and the result will be in Ascii.

here's a snippet:

$file1 = "anyfile.reg"
$file2 = "anyfile.ini"

runwait(@comspec & ' /c ' & 'type ' & $file1 & ' > ' & $file2,"", @sw_hide)

so the result is an ascii INI which i parse using iniread functions.

it's not done yet, im still figuring out how to handle if type of registry entry type. I will post it here when i am done (or close to done) if you or anyone else is interested.

-Blademonkey

Edited by blademonkey

---"Educate the Mind, Make Savage the Body" -Mao Tse Tung

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...