Sign in to follow this  
Followers 0
crashdemons

Decimal To UTF-8

11 posts in this topic

#1 ·  Posted (edited)

This function converts the Decimal Value of a U+Hex Code to the Decimal Value of the UTF-8 Encoding.

It supports the range of characters from 0000 - 10FFFF

Changes:

-Reread the article and thought about it a little more logically

-Attempted to Shorten the function

I needed some support for UTF-8 in some of my programs and I couldn't figure out a better way to do this, so I went the hard way. (Don't I always?)

Wikipedia: UTF-8 Description

Func Dec_To_UTF8($i)
        $z=$i
        $w=BitShift($z,18)
        $z-=$w*65536
        $x=BitShift($z,12)
        $z-=$x*4096
        $y=BitShift($z,6)
        $z-=$y*64
        If $i<128 Or ($i>55295 And $i<57344) Or $i>1114111 Then Return $i
        If $i>=128     Then        $z+=128       +($y*256     )
        If $i<=2047    Then Return $z+ 49152
        If $i>=2048    Then        $z+=32768     +($x*65536   )
        If $i<=65535   Then Return $z+ 14680064
        If $i>=65536   Then        $z+=8388608   +($w*16777216)
        If $i<=1114111 Then Return $z+ 4026531840
EndFunc
#cs
Decimal To UTF-8
    By Crash Daemonicus
==============================================
    $iUTF8 = _DecToUTF8($iUNICODE)
==============================================
Char Range Encoding 
    0-7F
        00000000
        +zzzzzzz
        =z
    80-7FF
        1100000000000000
        +  yyyyy00000000
        +       10000000
        +         zzzzzz
        =49152+(y*256)+128+z
    800-D7FF And E000-FFFF
        111000000000000000000000
        +   xxxx0000000000000000
        +       1000000000000000
        +         yyyyyy00000000
        +               10000000
        +                 zzzzzz
        =14680064+(x*65536)+32768+(y*256)+128+z
    10000-10FFFF
        11110000000000000000000000000000
        +    www000000000000000000000000
        +       100000000000000000000000
        +         xxxxxx0000000000000000
        +               1000000000000000
        +                 yyyyyy00000000
        +                       10000000
        +                         zzzzzz
        =4026531840+($w*16777216)+8388608+(x*65536)+32768+(y*256)+128+z
#ce

For Instance:

Hex --> UTF8

U+2260 --> E2 89 A0

Which is what my function needed to simulate.

Dec(2260)=8860

8800-->14846368

Examples:

$char1=Dec_To_UTF8(0x0020); Range: 0000 - 007F,  Input: 32, Output: 32 (20)
$char2=Dec_To_UTF8(0x06FC); Range: 0080 - 07FF,  Input: 1788, Output: 56252 (DB BC)
$char3=Dec_To_UTF8(0x2260); Range: 0800 - D7FF,  Input: 8800, Output: 14846368 (E2 89 A0)

An alternate way to do this is to use the Binary functions...

I didn't do this earlier, because I wanted to understand exactly how UTF-8 worked

;for unicode char 0x2260
$UTF8_Value=StringToBinary(ChrW(0x2260),4)
Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites



I may have grabbed the wrong end of the stick, but would this allow writing unicode characters to a file?

I haven't had any success with the FileOpen modes :\


My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website

Share this post


Link to post
Share on other sites

I may have grabbed the wrong end of the stick, but would this allow writing unicode characters to a file?

I haven't had any success with the FileOpen modes :\

Either method should be suitable for supplying the values for the singular bytes that make up each character. all that's left is to write those bytes to the file.

Processing them might be tricky but BinaryToString with the right setting could probably output the correct string.

However, if you're wanting to create a unicode text file, you'll need to decide which format you want to use as some are null-delimited-characters and/or use a special file header.


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

*rubs eyes a bit more*

Shouldn't have slept in, what you just said barely made any sense to my sleepy brain :)

Could you provide some examples of writing and reading unicode characters?


My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website

Share this post


Link to post
Share on other sites

Say we want to write the UTF-8 encoding of the unicode Character U+2260...

$fn='test.dat'
$fh=FileOpen($fn, 2 + 16); writing in forced-byte mode.
FileWrite($fn, StringToBinary(ChrW(0x2260), 4)); converts the Character of U+2260 to a Binary value using UTF-8 encoding, then writes it to the file.

Now, if we open 'test.dat' in a hex-editor we see the following:

e2 89 a0

Which is the UTF-8 equivalent of character U+2260.

This also matches the results listed in the first post for character 2260

0x2260=8860 (decimal)

8800 --> 14846368

0xE289A0=14846368 (decimal)

There's probably an easier way to write them to a file by using the '128' flag on FileOpen...

128 = Use Unicode UTF8 when writing text with FileWrite and FileWriteLine (default is ANSI)


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

Say we want to write the UTF-8 encoding of the unicode Character U+2260...

$fn='test.dat'
$fh=FileOpen($fn, 2 + 16); writing in forced-byte mode.
FileWrite($fn, StringToBinary(ChrW(0x2260), 4)); converts the Character of U+2260 to a Binary value using UTF-8 encoding, then writes it to the file.oÝ÷ ØÚ0ü¢§ßÛ^²×Z·b¨^Åçb¶ðzǶ¢YhÂ)àr^{o=kOÜ¡×'!ËayDÅñê®ö¥z{h}ÈZ­§-zµ>Ûn´N¬j[(«ë-êÞ²ém²X¬µçbØ^~*ì¶,µú+r«iË^¯mºÑ8^­íý²èm¦åÉ©Þjȯ²¶+×­é­¡§âæòºÈ§Ø^ßÝvó_¨(X¥xê^ú®¢×«Zn­­æx®­çââ7ög.®·§¶°®+bj*åÉ÷¶êÞ®'!µìmyØ­¢¶­©Ýf¶¼¢hw(®+j×®'(uë.¦+¶;¬¶X¤zz-znëbq©÷öضl©®+jkhrëyËeÊ·vØ^~)^¶­Iè¬jéâr^r«iË^®)âµh^^)à¶W¢z-z­ãMVéè¥éâayø¥z)æÊ趦­æÊZqë4÷b7öj)zx§Ø^~)^)¶¬jëh×6$Open = FileOpen($File, 0)
$Read = FileRead($Open)

Am i doing something wrong? :)


My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Here.

$fn='test.dat'
$ucode=0x0111; U+0111,  "Latin Small Letter D With Stroke" in tahoma font.
$fh=FileOpen($fn, 2+128); write as UTF-8 using the UTF-8 Byte-Order mark (EF BB BF)
FileWrite($fh, ChrW($ucode))
FileClose($fh)



$fh=FileOpen($fn, 0+16); read as binary
$data=FileRead($fh)
$data=BinaryMid($data, 4, 3); - remove the byte-order mark
$data=BinaryToString($data, 4)
$ucode=Hex(AscW($data), 4)
MsgBox(0, '','U+'&$ucode&@CRLF&'Character: '&$data); shows the U+ code and the Character
Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

This may be more useful for saving data though

Func _StringUTF8_To_String($UTF8_String)
    Return BinaryToString(StringToBinary($UTF8_String   , 4), 1)
EndFunc
Func _String_To_StringUTF8($String)
    Return BinaryToString(StringToBinary($String        , 1), 4)
EndFunc

This way you can take your raw UTF-8 string, convert it to ANSI and save it.

or read the file as ANSI, and convert back to UTF-8


My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

This may be more useful for saving data though

Func _StringUTF8_To_String($UTF8_String)
    Return BinaryToString(StringToBinary($UTF8_String   , 4), 1)
EndFunc
Func _String_To_StringUTF8($String)
    Return BinaryToString(StringToBinary($String        , 1), 4)
EndFunc

This way you can take your raw UTF-8 string, convert it to ANSI and save it.

or read the file as ANSI, and convert back to UTF-8

So when writing a file, i pass the text that's going to be written to the file through the _StringUTF8_ToString function?

And then when reading the file, pass the text file through the _String_ToStringUTF8 function?

And another question, do i write and read each character to and from the file individually, or as a whole string?

*QUICK EDIT*

I seem to have no problem reading Unicode text in .txt files, i think it's just the .dat extension that's ruining things :|

Edited by SxyfrG

My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Technically,

For Writing, you should be able to pass your unicode string to _StringUTF8_ToString and write the return string to a file as ANSI (fileopen mode 2)

For Reading, you should be able to pass the ANSI read (fileopen mode 0) to _String_ToStringUTF8 and use the return value of that as your unicode string.

Note: since you aren't including any header or Byte-Order Mark in your save-file this *may* break compatibility with Notepad, but at least your application will work.

if that fails, consider using binary-safe functions like the following to read and write the data.

Func FileOverwrite($file, $data)
    $fh = FileOpen($file, 2 + 16)
    $d = FileWrite($fh, StringToBinary($data))
    FileClose($fh)
    Return $d
EndFunc   ;==>FileOverwrite
Func FileReadFull($file)
    $fh = FileOpen($file, 16)
    $d = BinaryToString(FileRead($fh))
    FileClose($fh)
    Return $d
EndFunc   ;==>FileReadFull

EDIT EDIT: feel free to use .txt if it works, it's your app :) .

Edited by crashdemons

My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0