Sign in to follow this  
Followers 0
czardas

Another Unicode Question

12 posts in this topic

#1 ·  Posted (edited)

I have for years been thinking that one day I'll get good enough at scripting to make use of my computer the way I would like. I have been aware of the possibility existing for a long time, but not able - too dumb - don't get it. So we do not have support for UTF-8. Okay that's fine, but what's the work-around. There's always a work around.

So here we have a very common set of chars used worlld wide by as many people as speak the English language (if not more). Entry points: 0x1D100 to 0x1D1FF

See link : http://www.unicode.org/charts/PDF/U1D100.pdf

So the battle goes on between users and developers. I, being a user, who is trying to get the computer to say hello world. Sure I can speak Hex, it's the computer that's stubborn, not me. :P

Does anyone knoiw how I can display these symbols?

Edited by czardas

Share this post


Link to post
Share on other sites

#2 ·  Posted (edited)

Yes, I know.

First you have to have font that has defined symbols for those code points. I suggest searching for Symbola or maybe Euterpe font. Both are TT fonts and both are free as far as I know.

Then You have to write your own implementation for ChrW() function because AutoIt's built-in is incredibly dumb - sorry, limited to BMP plane (UCS-2) even though AutoIt internally can fully support UTF-16 in two minutes of work. Your function should be able to turn code points above 65535 into UTF16 character (or surrogate pair). After that all you have to do is make a gui to display characters or strings of them.

If you would need help with any of those steps let me know, it's really not that hard.

Edited by trancexx
1 person likes this

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

WOW! Symbola is awesome. :) trancexx, I will do my best. You are so generous to offer to assist me with this. I'm not so sure how to proceed with creating my own ChrW() function, although I think I ought to spend some time researching and absorbing some of the technical details and language associated with unicode first. Some of these words are new to me. I imagine I will have to use methods that are also new to me. If you can point me in the right direction, that's a real big help.

Edit: You've been a big help already. I'm doing lots of reading right now and I'm sure to have some questions in a day or two. I must also devote some time to other work inbetween my time for personal study.

Edited by czardas

Share this post


Link to post
Share on other sites

Indeed limitation to UCS-2 is unfortunate to say the least. Note that AutoIt isn't the only program limited to UCS-2 while claiming Unicode compliance.

The most ironically dumb example I know off is ... TADA ... Microsoft own charmap.exe whose purpose is precisely to display all and every (sic!) non-control glyph off a given font.

To wit, select a font with codepoints outside BMP i.e. Symbola, be sure to check "Advanced display" and Unicode charset, then watch charmap truncate the table at 0xFFFD.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

@jchd I didn't get the codepoints beyond 0xFFFD. I can access the outside BMP characters using the key combination Alt+X in MS Word, so it's all in there. Anyway there's a lot of clues in this thread already.

Edited by czardas

Share this post


Link to post
Share on other sites

Your ChrW could be something like this:

Func _ChrW($iCodePoint)
    If $iCodePoint <= 0xFFFF Then Return ChrW($iCodePoint)
    If $iCodePoint > 0x10FFFF Then Return SetError(1, 0, "")
    Local $tOut = DllStructCreate("word[2]")
    DllStructSetData($tOut, 1, BitShift($iCodePoint, 10) + 0xD7C0, 1)
    DllStructSetData($tOut, 1, BitAND($iCodePoint, 0x3FF) + 0xDC00, 2)
    Return BinaryToString(DllStructGetData(DllStructCreate("byte[4]", DllStructGetPtr($tOut)), 1), 2)
EndFunc

1 person likes this

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

trancexx, thank you ever so much for spending the time to show me this. I spent some time reading here:

http://msdn.microsoft.com/en-gb/goglobal/bb688113

I haven't finished reading these pages yet, but a lot is becoming clearer to me now. Your function is fantastic. I thought I would have to create a DLL for this, so I was thinking the right way. I probably would never have come up with something as good as that though. It works very well with ClipPut(). I need to play around with it a bit. You've changed my life with this small piece of code. :)

Edit: Second test with a label.

; You need to install Symbola font for this example to work correctly.

#include <GUIConstantsEx.au3>

_LabelTest()

Func _LabelTest()
    Local $hGUI = GUICreate("Test", 200, 140)
    Local $hLabel = GUICtrlCreateLabel("", 5, 5, 190, 130)
    GUICtrlSetFont($hLabel, 24, 400, 0, "Symbola")
    GUICtrlSetData($hLabel, " " & _ChrW(0x1D13D) & " " & _ChrW(0x1D160))
    GUISetState(@SW_SHOW)

    While 1
        $msg = GUIGetMsg()
        If $msg = $GUI_EVENT_CLOSE Then ExitLoop
    WEnd
EndFunc

Func _ChrW($iCodePoint) ; By trancexx
    If $iCodePoint <= 0xFFFF Then Return ChrW($iCodePoint)
    If $iCodePoint > 0x10FFFF Then Return SetError(1, 0, "")
    Local $tOut = DllStructCreate("word[2]")
    DllStructSetData($tOut, 1, BitShift($iCodePoint, 10) + 0xD7C0, 1)
    DllStructSetData($tOut, 1, BitAND($iCodePoint, 0x3FF) + 0xDC00, 2)
    Return BinaryToString(DllStructGetData(DllStructCreate("byte[4]", DllStructGetPtr($tOut)), 1), 2)
EndFunc

I'm very happy with this. :thumbsup:

Edited by czardas

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I started a post asking a question but it got eaten by a loss of connection. Thanks to her mind-reading ability, trancexx answered (part of) it!

I was merely wondering if the internal functions would enforce strict UTF-16 or treat the 16-byte stream verbatim. I have the answer for BinaryToString.

Can you tell us which string-related function would spoil surrogates?

EDIT: corrections!

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Would spoil or be spoilt? In caseof latter - any that depend on calculation of number of characters as opposed to binary length.

edit: my English is not that good.

Edited by trancexx

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Would spoil or be spoilt? In caseof latter - any that depend on calculation of number of characters as opposed to binary length.

I noticed that StringLen() returns multiples of two with these surrogate pairs. No big surprize really. Things like that are a small price to pay for having access to all these special symbols. It's brilliant! Edited by czardas

Share this post


Link to post
Share on other sites

Just to let you know, some older systems don't support supplementary character display by default. I think XP is there too and if I remember correctly I had it enabled on my XP, but forgot how I did it.

...That means on those systems your label will print 4 characters.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

Just to let you know, some older systems don't support supplementary character display by default. I think XP is there too and if I remember correctly I had it enabled on my XP, but forgot how I did it.

It was something I had considered, although it worked fine for me on XP There's some information about it here: http://msdn.microsoft.com/en-gb/library/windows/desktop/dd374069%28v=vs.85%29.aspx

Windows XP and later enable supplementary characters by default.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0