Jump to content
Sign in to follow this  
karlkar

Regexp detecting korean letters

Recommended Posts

karlkar

Hello.

I want to check if in my string exists any korean letter. According to http://jrgraphix.net/research/unicode_blocks.php the unicode characters for korean language are in 2 ranges u3130-u318F and uAC00-uD7AF

I have prepared on http://regexpal.com/ regexp that works. After moving it to autoit it doesn't work. What is wrong? Can anybody help?

Regexp that I use is:

[u3130-u318FuAC00-uD7AF]

 

test string is

※ 발생경로

 

In regexpal it detects letters as it should, but in autoit - not. Any ideas?

Edited by karlkar

Share this post


Link to post
Share on other sites
PhoenixXL

If you check the autoit help file with the function StringRegExp , you would find that for hex character is used rather than u

;8251, 48156, 49373, 44221, 47196 are the Unicode representing the characters you supplied on your post
;I got them using MsgBox( 0, "", AscW(ClipGet())), since SCITE didn't supported these chars.

$String = ChrW(8251) & ChrW(48156) & ChrW(49373) & ChrW(44221) & ChrW(47196)
If StringRegExp($String, "[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]") Then MsgBox(64, "Info", "Korean Alphabets Present")

$String = " Some English Text..."
If StringRegExp($String, "[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]") Then
    MsgBox(64, "Info", "Korean Alphabets Present")
Else
    MsgBox(16, "Err", "Korean Alphabet not found")
EndIf

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
sahsanu

Hello.

I want to check if in my string exists any korean letter. According to http://jrgraphix.net/research/unicode_blocks.php the unicode characters for korean language are in 2 ranges u3130-u318F and uAC00-uD7AF

I have prepared on http://regexpal.com/ regexp that works. After moving it to autoit it doesn't work. What is wrong? Can anybody help?

Regexp that I use is:

 

test string is

 

In regexpal it detects letters as it should, but in autoit - not. Any ideas?

 

Check whether this works:

[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]

Edit: PhoenixXL was faster and gave a better answer (explanation included) ;)

Edited by sahsanu

Share this post


Link to post
Share on other sites
PhoenixXL

This might come in handy for characters

;8251, 48156, 49373, 44221, 47196 are the Unicode representing the characters you supplied on your post
;I got them using MsgBox( 0, "", AscW(ClipGet())), since SCITE didn't supported these chars.


Func CharIsKorean($s_Char)

    Switch AscW($s_Char)
        Case 0x3130 To 0x318F, 0xAC00 To 0xD7AF
            Return 1
    EndSwitch

    Return 0

EndFunc   ;==>CharIsKorean

MsgBox(0, "Is Korean", "A? " & CharIsKorean("A"))
MsgBox(0, "Is Korean", ChrW(48156) & "? " & CharIsKorean(ChrW(48156)))  ;MsgBox is showing a Box for korean chars on my laptop
MsgBox(0, "Is Korean", ChrW(44221) & "? " & CharIsKorean(ChrW(44221)))

NOTE: Autoit uses PCRE, so when creating your regexes make sure its for PCRE.


My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
jchd

The beta version has support for UCP, so you can already try language properties (or wait for the next to come release):

Local $String = "Sample english text 한국어 텍스트의 예 טקסט עברית ירושלים"
Local $res = StringRegExp($String, "(*UCP)\b(\p{Hangul}[\p{Hangul}\s]*)\b", 1)
If not @error Then MsgBox(64, "Korean text found", $res[0])
Local $res = StringRegExp($String, "(*UCP)\b(\p{Latin}[\p{Latin}\s]*)\b", 1)
If not @error Then MsgBox(64, "Latin text found", $res[0])
Local $res = StringRegExp($String, "(*UCP)\b(\p{Hebrew}[\p{Hebrew}\s]*)\b", 1)
If not @error Then MsgBox(64, "Hebrew text found", $res[0])

Since UCP support is not active by default, yu need to use the "(*UCP)" prefix. Note that for instance Hangul doesn't include digits, spaces or punctuation, so you may need to adapt to your use case.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×