Jump to content

Functions for Ascii, Unicode, and UTF8 encoding


Arilvv
 Share

Recommended Posts

There are four functions:

Asc2Unicode($AscString)

Unicode2Asc($UnicodeString)

Unicode2Utf8($UnicodeString)

Utf82Unicode($Utf8String)

Note: $AscString and $Utf8String are normal strings, and $UnicodeString should be a binarystring

Tested under Chinese Tradition (Taiwan) environment and worked fine. Hope it support all language system.

Func Asc2Unicode($AscString)
    Local $BufferSize = StringLen($AscString) * 2
    Local $Buffer = DllStructCreate("byte[" & $BufferSize & "]")
    Local $Return = DllCall("Kernel32.dll", "int", "MultiByteToWideChar", _
        "int", 0, _
        "int", 0, _
        "str", $AscString, _
        "int", StringLen($AscString), _
        "ptr", DllStructGetPtr($Buffer), _
        "int", $BufferSize)
    Local $UnicodeString = StringLeft(DllStructGetData($Buffer, 1), $Return[0] * 2)
    $Buffer = 0
    Return $UnicodeString
EndFunc

Func Unicode2Asc($UniString)
    If Not IsBinaryString($UniString) Then
        SetError(1)
        Return $UniString
    EndIf

    Local $BufferLen = StringLen($UniString)
    Local $Input = DllStructCreate("byte[" & $BufferLen & "]")
    Local $Output = DllStructCreate("char[" & $BufferLen & "]")
    DllStructSetData($Input, 1, $UniString)
    Local $Return = DllCall("kernel32.dll", "int", "WideCharToMultiByte", _
        "int", 0, _
        "int", 0, _
        "ptr", DllStructGetPtr($Input), _
        "int", $BufferLen / 2, _
        "ptr", DllStructGetPtr($Output), _
        "int", $BufferLen, _
        "int", 0, _
        "int", 0)   
    Local $AscString = DllStructGetData($Output, 1)
    $Output = 0
    $Input = 0
    Return $AscString
EndFunc

Func Unicode2Utf8($UniString)
    If Not IsBinaryString($UniString) Then
        SetError(1)
        Return $UniString
    EndIf

    Local $UniStringLen = StringLen($UniString)
    Local $BufferLen = $UniStringLen * 2
    Local $Input = DllStructCreate("byte[" & $BufferLen & "]")
    Local $Output = DllStructCreate("char[" & $BufferLen & "]")
    DllStructSetData($Input, 1, $UniString)
    Local $Return = DllCall("kernel32.dll", "int", "WideCharToMultiByte", _
        "int", 65001, _
        "int", 0, _
        "ptr", DllStructGetPtr($Input), _
        "int", $UniStringLen / 2, _
        "ptr", DllStructGetPtr($Output), _
        "int", $BufferLen, _
        "int", 0, _
        "int", 0)   
    Local $Utf8String = DllStructGetData($Output, 1)
    $Output = 0
    $Input = 0
    Return $Utf8String
EndFunc

Func Utf82Unicode($Utf8String)
    Local $BufferSize = StringLen($Utf8String) * 2
    Local $Buffer = DllStructCreate("byte[" & $BufferSize & "]")
    Local $Return = DllCall("Kernel32.dll", "int", "MultiByteToWideChar", _
        "int", 65001, _
        "int", 0, _
        "str", $Utf8String, _
        "int", StringLen($Utf8String), _
        "ptr", DllStructGetPtr($Buffer), _
        "int", $BufferSize)
    Local $UnicodeString = StringLeft(DllStructGetData($Buffer, 1), $Return[0] * 2)
    $Buffer = 0
    Return $UnicodeString   
EndFunc
Link to comment
Share on other sites

  • 1 month later...

Can you post a simple example reading a line of unicode from a text file and outting the text as asc?

I have a feeling this won't accomplish what I would like because AU3 doesn't read unicode.

Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Link to comment
Share on other sites

Good work! Almost all functions works fine for me except Asc2Unicode - it's give unicode stream, but without BOM. Some programs not understand such stream. Changed this function a bit, added ability to add BOM to beginning of stream.

Func Asc2Unicode($AscString, $addBOM = false)
    Local $BufferSize = StringLen($AscString) * 2
    Local $FullUniStr = DllStructCreate("byte[" & $BufferSize + 2 & "]")
    Local $Buffer = DllStructCreate("byte[" & $BufferSize & "]", DllStructGetPtr($FullUniStr) + 2)
    Local $Return = DllCall("Kernel32.dll", "int", "MultiByteToWideChar", _
        "int", 0, _
        "int", 0, _
        "str", $AscString, _
        "int", StringLen($AscString), _
        "ptr", DllStructGetPtr($Buffer, 1), _
        "int", $BufferSize)
    DllStructSetData($FullUniStr, 1, 0xFF, 1)
    DllStructSetData($FullUniStr, 1, 0xFE, 2)
    If $addBOM then 
        Return DllStructGetData($FullUniStr, 1)
    Else
        Return DllStructGetData($Buffer, 1)
    Endif
EndFunc

Don't dig it deep, maybe this can be done simpler.

Link to comment
Share on other sites

I JUST had a problem that this was the solution to the problem. I also have eastern language support and occasional use of double byte characters (some Japanese customers). I'll let you know if I run in to any issues after I rework my code.

Nice work.

"I have discovered that all human evil comes from this, man's being unable to sit still in a room. " - Blaise Pascal
Link to comment
Share on other sites

Is this right? I get nothing back.

$File1 = FileOpen("C:\Unicode.txt", 0)
$Unicode = FileReadLine($File1, 1)
Unicode2Asc($Unicode)
FileClose ($File1)
MsgBox(0,"",$AscString)

File used is attached.

Unicode.txt

Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Link to comment
Share on other sites

Is this right? I get nothing back.

Yes, this is right. This is unicode text, which Autoit can't read natively, so you should read it as binary.

$File1 = FileOpen("C:\Unicode.txt", 4); 4 - raw read mode
$Unicode = FileRead($File1, FileGetSize("C:\Unicode.txt"))
$AscString = Unicode2Asc($Unicode)
FileClose ($File1)
MsgBox(0,"", $AscString)
Link to comment
Share on other sites

Cool, that works for me with two small problems... The first character is not part of the text file and I need it to read only one line at a time. It's reading the whole file, neither problem should be too difficult to correct.

Posted Image

Anyone needing to convert UNICODE to ASCII / ANSI THIS WORKS!!!

I don't have a need for the other functions at this time, but I'm sure they work just as well.

Thanks for your efforts, they're greatly appreciated.

Agreement is not necessary - thinking for one's self is!

My-Colors.jpg

cuniform2.gif

Link to comment
Share on other sites

Well I guess it's harder than I first thought.

Is there a way to read a line at a time when using the binary mode?

Just simple read - no, of course. But it's possible to write UDF that will scan file for line ends (it's 0D 00 0A 00 in unicode) and read only data before it.

But if file size not too big, imo simpler to convert it to ANSI before use.

Edited by Lazycat
Link to comment
Share on other sites

  • 4 months later...
  • 4 months later...
  • 4 months later...
  • 3 weeks later...

Chinese throws me '0xB2'

I'm using Chinese IME, Chinese PRC.

Heres the code:

#Compiler_Res_Fileversion = 1.00
#Compiler_Res_LegalCopyright = NOT FOR RELEASE
#Compiler_Res_Comment = NOT FOR RELEASE
#Compiler_Res_Description = NOT FOR RELEASE
#Compiler_Allow_Decompile = n
#include <GUIConstants.au3>
#include <_Encoding.au3>
#include <String.au3>


$Do = GUICreate("What To Do?", 335, 66, 193, 115)
$To = GUICtrlCreateLabel("Now, do:", 8, 10, 55, 17)
$Thing = GUICtrlCreateInput("", 64, 8, 265, 21, 0x800)
GUICtrlSetBkColor($Thing, 0xFFFFFF)
$Another = GUICtrlCreateButton("Another thing to do, please!", 8, 32, 321, 31, 0)
_WhatToDoToday()
GUISetState(@SW_SHOW)


While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit
        Case $Another
            _WhatToDoToday()
    EndSwitch
WEnd
Func _WhatToDoToday()
    $ran = Random(1, 500, 1)
    $1 = Utf82Unicode("ÎÒ²»ÖªµÀ")
    Switch $ran
        Case 1 to 9
            $TTDT = $1
        Case 10 to 49
            $TTDT = $1
        Case 50 to 99
            $TTDT = $1
        Case 100 to 200
            $TTDT = $1
        Case 201 to 300
            $TTDT = "Orbis PQ, foruming, chatting and AutoIt!"
        Case 301 to 400
            $TTDT = "PQ"
        Case 401 to 499
            $TTDT = "Kerning PQ"
        Case Else
            $TTDT = "Whatever you like!"
    EndSwitch
    GUICtrlSetData($Thing, $TTDT)
EndFunc
Edited by AngelSL
Link to comment
Share on other sites

  • 1 month later...

Why this little command line is not working?

The UTF-8 file does not display well the characters. Any idea whats wrong??

To test it, run from the command line the following syntax

> U2UTF8.exe 'Path to Source Unicode File' 'Path to Destination UTF-8 File'

You need to comile it first ....

if $CmdLine[0] <> 2 then 
    MsgBox(0,0,"Uses: U2UTF8 'Path to Source Unicode File' 'Path to Destination UTF-8 File' ")
    exit
EndIf
    
Dim $UnicodeFile    = $CmdLine[1]
Dim $UTF8FILE       = $CmdLine[2]   

$File1 = FileOpen($UnicodeFile, 16); 4 - raw read mode
$Unicode = FileRead($File1, FileGetSize($UnicodeFile))
$UTF8String = Unicode2Utf8($Unicode)
FileClose ($File1)
;MsgBox(0,"", $UTF8FILE)

;MsgBox(0,"", $UTF8FILE)
$file = FileOpen($UTF8FILE,128+2)
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
  ; Exit
EndIf
FileWrite($file, $UTF8String)
FileClose ($file)

Func Unicode2Utf8($UniString)
    If Not IsBinary($UniString) Then
        SetError(1)
        MsgBox(0,0,"not binary")
        Return $UniString
    EndIf

    Local $UniStringLen = StringLen($UniString)
    Local $BufferLen = $UniStringLen * 2
    Local $Input = DllStructCreate("byte[" & $BufferLen & "]")
    Local $Output = DllStructCreate("char[" & $BufferLen & "]")
    DllStructSetData($Input, 1, $UniString)
    Local $Return = DllCall("kernel32.dll", "int", "WideCharToMultiByte", _
        "int", 65001, _
        "int", 0, _
        "ptr", DllStructGetPtr($Input), _
        "int", $UniStringLen / 2, _
        "ptr", DllStructGetPtr($Output), _
        "int", $BufferLen, _
        "int", 0, _
        "int", 0)   
    Local $Utf8String = DllStructGetData($Output, 1)
    $Output = 0
    $Input = 0
    Return $Utf8String
EndFunc

Be Green Now or Never (BGNN)!

Link to comment
Share on other sites

  • 2 months later...

Sorry for bumping old topic, I think something wrong in the code.

Func Utf82Unicode($Utf8String)
    Local $BufferSize = StringLen($Utf8String) * 2
    Local $Buffer = DllStructCreate("byte[" & $BufferSize & "]")
    Local $Return = DllCall("Kernel32.dll", "int", "MultiByteToWideChar", _
        "int", 65001, _
        "int", 0, _
        "str", $Utf8String, _
        "int", StringLen($Utf8String), _
        "ptr", DllStructGetPtr($Buffer), _
        "int", $BufferSize)
    Local $UnicodeString = StringLeft(DllStructGetData($Buffer, 1), $Return[0] * 2)
    $Buffer = 0
    Return $UnicodeString   
EndFuncoÝ÷ Ù«­¢+ÙÕ¹UÑàÉU¹¥½ ÀÌØíUÑáMÑÉ¥¹¤(1½°ÀÌØí   ÕÉM¥éôMÑÉ¥¹1¸ ÀÌØíUÑáMÑÉ¥¹¤¨È(1½°ÀÌØí  ÕÈô±±MÑÉÕÑ
ÉÑ ÅÕ½ÐíåÑlÅÕ½ÐìµÀìÀÌØí    ÕÉM¥éµÀìÅÕ½ÐítÅÕ½Ðì¤(1½°ÀÌØíIÑÕɸô±±
±° ÅÕ½Ðí-ɹ°Ìȹ±°ÅÕ½Ðì°ÅÕ½Ðí¥¹ÐÅÕ½Ðì°ÅÕ½Ðí5ձѥ åÑQ½]¥
¡ÈÅÕ½Ðì°|(ÅÕ½Ðí¥¹ÐÅÕ½Ðì°ØÔÀÀÄ°|(ÅÕ½Ðí¥¹ÐÅÕ½Ðì°À°|(ÅÕ½ÐíÍÑÈÅÕ½Ðì°ÀÌØíUÑáMÑÉ¥¹°|(ÅÕ½Ðí¥¹ÐÅÕ½Ðì°MÑÉ¥¹1¸ ÀÌØíUÑáMÑÉ¥¹¤°|(ÅÕ½ÐíÁÑÈÅÕ½Ðì°±±MÑÉÕÑÑAÑÈ ÀÌØí  ÕȤ°|(ÅÕ½Ðí¥¹ÐÅÕ½Ðì°ÀÌØí  ÕÉM¥é¤(1½°ÀÌØíU¹¥½MÑÉ¥¹ô±±MÑÉÕÑÑÑ ÀÌØí   ÕȰĤ(ÀÌØí ÕÈôÀ(IÑÕɸÀÌØíU¹¥½MÑÉ¥¹)¹Õ¹
Edited by Dhilip89

[u]My Projects[/u]:General:WinShell (Version 1.6)YouTube Video Downloader Core (Version 2.0)Periodic Table Of Chemical Elements (Version 1.0)Web-Based:Directory Listing Script Written In AutoIt3 (Version 1.9 RC1)UDFs:UnicodeURL UDFHTML Entity UDF[u]My Website:[/u]http://dhilip89.hopto.org/[u]Closed Sources:[/u]YouTube Video Downloader (Version 1.3)[quote]If 1 + 1 = 10, then 1 + 1 ≠ 2[/quote]

Link to comment
Share on other sites

  • 5 years later...

Hi, I search a code to convert a word or a character to unicode in format u00xx... likes json use.

This topic is help me in this function, and I rewrite this:

Question, this is a best way to do this?

I can't understand because the original script reply 0x6100 and not 0x0061, why?

What I want is a code to input a any character, and return the unicode format in u0001, u0002, u003f, like something...

Thanks

; http://www.autoitscript.com/forum/topic/21815-functions-for-ascii-unicode-and-utf8-encoding/#entry174115
Local $character = 'a'
ConsoleWrite(Asc2Unicode($character) & @LF); print \u0061

Func Asc2Unicode($input)
    If StringLen($input) <> 1 Then Return SetError(-1, -1, -1)
    Local $FullUniStr = DllStructCreate('byte[3]')
    Local $Buffer = DllStructCreate('byte[2]', DllStructGetPtr($FullUniStr) + 2)
    Local $Return = DllCall('Kernel32.dll', 'int', 'MultiByteToWideChar', _
            'int', 0, _
            'int', 0, _
            'str', $input, _
            'int', StringLen($input), _
            'ptr', DllStructGetPtr($Buffer, 1), _
            'int', 2)
    DllStructSetData($FullUniStr, 1, 0xFF, 1)
    DllStructSetData($FullUniStr, 1, 0xFE, 2)
    Local $temp = DllStructGetData($Buffer, 1)
    Return '\u' & StringMid($temp, 5, 2) & StringMid($temp, 3, 2)
EndFunc   ;==>Asc2Unicode
Edited by detefon

Visit my repository

Link to comment
Share on other sites

Necroing a 6-7 years old thread in examples is not the best way to get help. Help forum exists for a reason.

Furthermore, you rely on an old version and code that shouldn't have remained here as it offers ugly workaround to non-existing issues.

AutoIt strings use UCS-2 encoding, a subset of UTF16-LE Unicode limited to one 16-bit word per character.

AscW("x") returns the value of the Unicode codepoint passed. Just concatenate "u" with the Hex representation of that value and you're done.

Search help file if needed for completing the homework.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@detefon

The unicode representation of the letter "a" = U+0061 or JSON like 0061

http://www.utf8-chartable.de/

Unicode
code point character UTF-8
(hex.) U+0061 a 61 LATIN SMALL  A

 

@jchd

Did you mean?

ConsoleWrite("--- \u" & Hex(AscW("a"),4) & @CRLF)

 

Rgds

ptrex

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...