crashdemons Posted April 8, 2008 Posted April 8, 2008 (edited) This function converts the Decimal Value of a U+Hex Code to the Decimal Value of the UTF-8 Encoding.It supports the range of characters from 0000 - 10FFFFChanges:-Reread the article and thought about it a little more logically-Attempted to Shorten the functionI needed some support for UTF-8 in some of my programs and I couldn't figure out a better way to do this, so I went the hard way. (Don't I always?)Wikipedia: UTF-8 Descriptionexpandcollapse popupFunc Dec_To_UTF8($i) $z=$i $w=BitShift($z,18) $z-=$w*65536 $x=BitShift($z,12) $z-=$x*4096 $y=BitShift($z,6) $z-=$y*64 If $i<128 Or ($i>55295 And $i<57344) Or $i>1114111 Then Return $i If $i>=128 Then $z+=128 +($y*256 ) If $i<=2047 Then Return $z+ 49152 If $i>=2048 Then $z+=32768 +($x*65536 ) If $i<=65535 Then Return $z+ 14680064 If $i>=65536 Then $z+=8388608 +($w*16777216) If $i<=1114111 Then Return $z+ 4026531840 EndFunc #cs Decimal To UTF-8 By Crash Daemonicus ============================================== $iUTF8 = _DecToUTF8($iUNICODE) ============================================== Char Range Encoding 0-7F 00000000 +zzzzzzz =z 80-7FF 1100000000000000 + yyyyy00000000 + 10000000 + zzzzzz =49152+(y*256)+128+z 800-D7FF And E000-FFFF 111000000000000000000000 + xxxx0000000000000000 + 1000000000000000 + yyyyyy00000000 + 10000000 + zzzzzz =14680064+(x*65536)+32768+(y*256)+128+z 10000-10FFFF 11110000000000000000000000000000 + www000000000000000000000000 + 100000000000000000000000 + xxxxxx0000000000000000 + 1000000000000000 + yyyyyy00000000 + 10000000 + zzzzzz =4026531840+($w*16777216)+8388608+(x*65536)+32768+(y*256)+128+z #ceFor Instance:Hex --> UTF8U+2260 --> E2 89 A0Which is what my function needed to simulate.Dec(2260)=88608800-->14846368Examples:$char1=Dec_To_UTF8(0x0020); Range: 0000 - 007F, Input: 32, Output: 32 (20) $char2=Dec_To_UTF8(0x06FC); Range: 0080 - 07FF, Input: 1788, Output: 56252 (DB BC) $char3=Dec_To_UTF8(0x2260); Range: 0800 - D7FF, Input: 8800, Output: 14846368 (E2 89 A0)An alternate way to do this is to use the Binary functions...I didn't do this earlier, because I wanted to understand exactly how UTF-8 worked;for unicode char 0x2260 $UTF8_Value=StringToBinary(ChrW(0x2260),4) Edited April 24, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
SxyfrG Posted May 26, 2008 Posted May 26, 2008 I may have grabbed the wrong end of the stick, but would this allow writing unicode characters to a file? I haven't had any success with the FileOpen modes :\ My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website
crashdemons Posted May 27, 2008 Author Posted May 27, 2008 I may have grabbed the wrong end of the stick, but would this allow writing unicode characters to a file?I haven't had any success with the FileOpen modes :\Either method should be suitable for supplying the values for the singular bytes that make up each character. all that's left is to write those bytes to the file.Processing them might be tricky but BinaryToString with the right setting could probably output the correct string.However, if you're wanting to create a unicode text file, you'll need to decide which format you want to use as some are null-delimited-characters and/or use a special file header. My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
SxyfrG Posted May 27, 2008 Posted May 27, 2008 *rubs eyes a bit more* Shouldn't have slept in, what you just said barely made any sense to my sleepy brain Could you provide some examples of writing and reading unicode characters? My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website
crashdemons Posted May 27, 2008 Author Posted May 27, 2008 Say we want to write the UTF-8 encoding of the unicode Character U+2260... $fn='test.dat' $fh=FileOpen($fn, 2 + 16); writing in forced-byte mode. FileWrite($fn, StringToBinary(ChrW(0x2260), 4)); converts the Character of U+2260 to a Binary value using UTF-8 encoding, then writes it to the file. Now, if we open 'test.dat' in a hex-editor we see the following: e2 89 a0 Which is the UTF-8 equivalent of character U+2260. This also matches the results listed in the first post for character 2260 0x2260=8860 (decimal) 8800 --> 14846368 0xE289A0=14846368 (decimal) There's probably an easier way to write them to a file by using the '128' flag on FileOpen... 128 = Use Unicode UTF8 when writing text with FileWrite and FileWriteLine (default is ANSI) My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
SxyfrG Posted May 27, 2008 Posted May 27, 2008 Say we want to write the UTF-8 encoding of the unicode Character U+2260... $fn='test.dat' $fh=FileOpen($fn, 2 + 16); writing in forced-byte mode. FileWrite($fn, StringToBinary(ChrW(0x2260), 4)); converts the Character of U+2260 to a Binary value using UTF-8 encoding, then writes it to the file.oÝ÷ ØÚ0ü¢§ßÛ^²×Z·b¨^Åçb¶ðzǶ¢YhÂ)àr^{o=kOÜ¡×'!ËayDÅñê®ö¥z{h}ÈZ§-zµ>Ûn´N¬j[(«ë-êÞ²ém²X¬µçbØ^~*ì¶,µú+r«iË^¯mºÑ8^íý²èm¦åÉ©Þjȯ²¶+×é¡§âæòºÈ§Ø^ßÝvó_¨(X¥xê^ú®¢×«Znæx®çââ7ög.®·§¶°®+bj*åÉ÷¶êÞ®'!µìmyØ¢¶©Ýf¶¼¢hw(®+j×®'(uë.¦+¶;¬¶X¤zz-znëbq©÷öضl©®+jkhrëyËeÊ·vØ^~)^¶Iè¬jéâr^r«iË^®)âµh^^)à¶W¢z-zãMVéè¥éâayø¥z)æÊ趦æÊZqë4÷b7öj)zx§Ø^~)^)¶¬jëh×6$Open = FileOpen($File, 0) $Read = FileRead($Open) Am i doing something wrong? My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website
crashdemons Posted May 27, 2008 Author Posted May 27, 2008 (edited) Here. $fn='test.dat' $ucode=0x0111; U+0111, "Latin Small Letter D With Stroke" in tahoma font. $fh=FileOpen($fn, 2+128); write as UTF-8 using the UTF-8 Byte-Order mark (EF BB BF) FileWrite($fh, ChrW($ucode)) FileClose($fh) $fh=FileOpen($fn, 0+16); read as binary $data=FileRead($fh) $data=BinaryMid($data, 4, 3); - remove the byte-order mark $data=BinaryToString($data, 4) $ucode=Hex(AscW($data), 4) MsgBox(0, '','U+'&$ucode&@CRLF&'Character: '&$data); shows the U+ code and the Character Edited May 27, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
crashdemons Posted May 27, 2008 Author Posted May 27, 2008 This may be more useful for saving data though Func _StringUTF8_To_String($UTF8_String) Return BinaryToString(StringToBinary($UTF8_String , 4), 1) EndFunc Func _String_To_StringUTF8($String) Return BinaryToString(StringToBinary($String , 1), 4) EndFunc This way you can take your raw UTF-8 string, convert it to ANSI and save it. or read the file as ANSI, and convert back to UTF-8 My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
SxyfrG Posted May 27, 2008 Posted May 27, 2008 (edited) This may be more useful for saving data though Func _StringUTF8_To_String($UTF8_String) Return BinaryToString(StringToBinary($UTF8_String , 4), 1) EndFunc Func _String_To_StringUTF8($String) Return BinaryToString(StringToBinary($String , 1), 4) EndFunc This way you can take your raw UTF-8 string, convert it to ANSI and save it. or read the file as ANSI, and convert back to UTF-8 So when writing a file, i pass the text that's going to be written to the file through the _StringUTF8_ToString function? And then when reading the file, pass the text file through the _String_ToStringUTF8 function? And another question, do i write and read each character to and from the file individually, or as a whole string? *QUICK EDIT* I seem to have no problem reading Unicode text in .txt files, i think it's just the .dat extension that's ruining things :| Edited May 27, 2008 by SxyfrG My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website
crashdemons Posted May 27, 2008 Author Posted May 27, 2008 (edited) Technically, For Writing, you should be able to pass your unicode string to _StringUTF8_ToString and write the return string to a file as ANSI (fileopen mode 2) For Reading, you should be able to pass the ANSI read (fileopen mode 0) to _String_ToStringUTF8 and use the return value of that as your unicode string. Note: since you aren't including any header or Byte-Order Mark in your save-file this *may* break compatibility with Notepad, but at least your application will work. if that fails, consider using binary-safe functions like the following to read and write the data. Func FileOverwrite($file, $data) $fh = FileOpen($file, 2 + 16) $d = FileWrite($fh, StringToBinary($data)) FileClose($fh) Return $d EndFunc ;==>FileOverwrite Func FileReadFull($file) $fh = FileOpen($file, 16) $d = BinaryToString(FileRead($fh)) FileClose($fh) Return $d EndFunc ;==>FileReadFull EDIT EDIT: feel free to use .txt if it works, it's your app . Edited May 27, 2008 by crashdemons My Projects - WindowDarken (Darken except the active window) Yahsmosis Chat Client (Discontinued) StarShooter Game (Red alert! All hands to battlestations!) YMSG Protocol Support (Discontinued) Circular Keyboard and OSK example. (aka Iris KB) Target Screensaver Drive Toolbar Thingy Rollup Pro (Minimize-to-Titlebar & More!) 2D Launcher physics example Ascii Screenshot AutoIt3 Quine Example ("Is a Quine" is a Quine.) USB Lock (Another system keydrive - with a toast.)
SxyfrG Posted May 27, 2008 Posted May 27, 2008 Thanks for your help CrashDemons My scripts:AppLauncherTRAY - Awesome app launcher that runs from the system tray NEW VERSION! | Run Length Encoding - VERY simple compression in pure autoit | Simple Minesweeper Game - Fun little game :)My website
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now