Jump to content
Sign in to follow this  
tobject

Unicode and IE chinese parsing

Recommended Posts

tobject

I'm parsing some chinese site

but looks like native chars not coming into AutoIt

Do i need to enable unicode somewhere/somehow?

Share this post


Link to post
Share on other sites
jchd

I assume you used a MsgBox to display that. The issue seem to me that Winows is using a non Big5 font for its display. Is this a Big5 Windows version, or some occidental version where you happen to display Chinese app or text?

Current AutoIt versions only cope with Unicode plane 0 (understand plain 16-bit codepoints, without regard to surrogates and upper Unicode planes) which is essentially the UCS-2 charset, for most of its functions.

But MsgBox map (I suppose) 1 to 1 to Windows function and if your Windows version doesn't have Big5 support (fonts) loaded, you won't be able to display Vietnamese, Japanese, Korean or Chinese correctly (with this function). So make sure you have asian languages support loaded.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
tobject

Yes MsgBox

if font is displayed in IE does this mean I have it installed?

I save it to text file as well - I get bunch of "??????" chars instead of unicode

Regional settings has Big5 selected

Edited by tobject

Share this post


Link to post
Share on other sites
jchd

Don't forget that you absolutely need to use UTF-8 with BOM encoding for your source (and text) files to be dealt with correctly by AutoIt. Please also double check you use the latest available version and the full Scite4AutoIt3 (latest build as well). AutoIt is a Unicode program and is unable to cope with non-Unicode multibyte encodings (they are still very common in Asia).

Now download the following source (I saved it as UTF-8 + BOM encoding) and run it. You should see a (possibly incorrect) MsgBox (similar to what you have), a (possibly incorrect) ArrayDisplay with string in various scripts, and a correct ArrayDisplay of the same. If the latter still doesn't display correctly, try using another font in the _MyArrayDisplay function (font change is between comment lines of $$).

That will help us determine what's going wrong.

TryFont.zip

Edit:

To answer your question about browser: even if you don't have a large number of non-native fonts installed, browsers struggle hard to find character glyphs representations you don't even know how to compose those strange characters. So, no it isn't a demonstrating test.

Here's a cut & paste of the array in the code:

Local $str[10] = [ _

"Sant Julià de Lòria", _

"Skrýchov u Opařan", _

"Žíšov", _

"БОЛЬШОЕ ГРИДИНО", _

"МЫТИЩИ-ДТИ", _

"歴史的仮名遣 楷書|楷书 Cao Wei 曹魏", _

"変体仮名 中共中央紀律檢查委員會|中共中央纪律检查委员会 ", _

" فرنسيّ عربيّ", _

"सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है। उन्हें बुद्धि और अन्तरात्मा की देन है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिये।", _

"เขาจะได้ไปเที่ยวเมืองลาว" _

]

You and me and most people around the globe will be able to see that, even if they don't believe they have chinese, devanāgarī, arabic, cyrillic, ... fonts in their stock english (French for me) Windows version.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
tobject

Yep 3rd one works

but everything else expect chinese on 1st one is correct

does this mean I'm missing Chinese system fonts?

Do you have simple function for string conversion?

Edited by tobject

Share this post


Link to post
Share on other sites
jchd

Glad to see that at least something works for you.

The problem is not string conversion, but how to make Windows use the right default font to display Unicode text.

Double check you don't have a Windows theme which changes the default fonts.

Now, the issue with MsgBox is that Windows doesn't use the right font. Since this is a built-in function I assume we can't do much against that. I've no idea what happens actually, but I can't do much, not having an asian Windows to test.

Stiil there is something else that you can check easily before resorting to a workaround: what is your input language? It should be Chinese (or your asian language). I guess it also depends on which Windows version you run. You can switch between input/display languages and that impact Windows default display fonts as well. For instance, I assume you switch to English for composing programs and forum posts, but you have to switch to Chinese to enter Chinese litterals.

How do native Unicode chinese program behave on your system?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
tobject

I have standard US Windows Xp SP3 with regional settings

don't have anything in Chinese, I'm just parsing chinese site. Don't even know single chinse word! :idea: lol

Maybe when I write to a text file when the info is lost?

I sent you PM

Edited by tobject

Share this post


Link to post
Share on other sites
jchd

US, that's why!

You need to install Asian supplement.

Config Panel >> Regional Settings >> Languages tab >> Install support for Asiatic languages

That should work much better then!

P.S. no need to PM code.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
tobject

ok got it.

wow 230M jeez

I'll try it tomorrow :idea:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×