Uten Posted October 31, 2006 Share Posted October 31, 2006 (edited) I'm need to pick out words containing characters from the Norwegian alphabet (Most European languages has some special characters). So I was wondering, does anyone of you do StringRegExp searches with some of those characters? I know AutoIt does not use unicode strings but StringInStr does find those (æ,ø,å) letters so I thought StringRegExp would (should) to? The reason I'm doing this is that I'm creating something like a T9 (the keyboard on cell phones) touch screen keyboard. Obviously I might have goofed in my regexp pattern so if you have suggestions pleas let me know. Some test code: expandcollapse popuptestEuropeanLetters() Exit Func testEuropeanLetters() ;PURPOSE: See if StringRegExp handels European special chars ; Norwegian [æÆøØåÅ] Local $data, $regexp, $expect, $msg, $res Local $flag=3 ConsoleWrite('Asc("æ"):=' & Asc('æ') & @LF) ConsoleWrite('Asc("æ"):=' & Asc('ø') & @LF) ConsoleWrite('Asc("æ"):=' & Asc('å') & @LF) If StringInStr("æ ø å", "æ") Then ConsoleWrite("StringInStr works with æ" & @LF) If StringInStr("æ ø å", "ø") Then ConsoleWrite("StringInStr works with ø" & @LF) If StringInStr("æ ø å", "å") Then ConsoleWrite("StringInStr works with å" & @LF) #region - data $data = 'ægiden' & @CRLF & 'ær' & @CRLF & 'æra' & @CRLF & 'æraen' & @CRLF & _ 'ærvær' & @CRLF & 'æser' & @CRLF & 'æsj' & @CRLF & 'æte' & @CRLF & 'ætling' & @CRLF & _ 'ætlingen' & @CRLF & 'ætt' & @CRLF & 'ætta' & @CRLF & 'ættbåren' & @CRLF & 'æva' & @CRLF & _ 'æve' & @CRLF & 'ævelig' & @CRLF & 'æven' & @CRLF & 'æver' & @CRLF & 'øde' & @CRLF & _ 'ødegård' & @CRLF & 'ødela' & @CRLF & 'ødelagt' & @CRLF & 'ødelagte' & @CRLF & 'ødeland' & @CRLF & _ 'øglene' & @CRLF & 'øgler' & @CRLF & 'øk' & @CRLF & 'øke' & @CRLF & 'økede' & @CRLF & _ 'økenavn' & @CRLF & 'økenavnet' & @CRLF & 'øl' & @CRLF & 'ølebrød' & @CRLF & 'ølebrødet' & @CRLF & _ 'ølen' & @CRLF & 'øm' & @CRLF & 'ømfintlig' & @CRLF & 'ømt' & @CRLF & 'ømtålelige' & @CRLF & _ 'ør' & @CRLF & 'øra' & @CRLF & 'øre' & @CRLF & 'ørebetennelse' & @CRLF & 'ørebro' & @CRLF & _ 'østsiden' & @CRLF & 'øv' & @CRLF & 'øvd' & @CRLF & 'øye' & @CRLF & 'å' & @CRLF & _ 'åa' & @CRLF & 'åbit' & @CRLF & 'åbor' & @CRLF & 'åk' & @CRLF & 'åker' & @CRLF & 'åkrer' & @CRLF & _ 'ål' & @CRLF & 'åla' & @CRLF & 'ålborg' #endregion - data $regexp = "\b[æøå]\b" ;Returns æ,å,å,ø,ø,å -> Expected å $regexp = "\b[æøå]{1}\b" ;Returns æ,å,å,ø,ø,å -> Expected å $regexp = "\b[\O230\O248\O229]\b" ;Nothing -> Expected å $regexp = "\b\O229\b" ;Nothing -> Expected å $res = StringRegExp($data, $regexp, 3) Local $i if IsArray($res) Then For $i = 0 to UBound($res) - 1 ConsoleWrite("+>$res[" & $i & "]:=" & $res[$i] & @LF) Next Else ConsoleWrite("! Nothing returned to array" & @LF) EndIf EndFunc EDIT: Typo in the code. Edited October 31, 2006 by Uten Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
MHz Posted October 31, 2006 Share Posted October 31, 2006 Obviously I might have goofed in my regexp pattern so if you have suggestions pleas let me know.Don't think you goofed, but rather unicode support seems non existant in AutoIt's PCRE.Some known references that seem to fail\XMatches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a "character".\X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc.\uFFFF where FFFF are 4 hexadecimal digitsMatches a specific Unicode code point. Can be used inside character classes.\u00E0 matches à encoded as U+00E0 only. \u00A9 matches ©\p{L} or \p{Letter}Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes.\p{L} matches à encoded as U+00E0; \p{S} matches ©\P{L} or \P{Letter}Matches a single Unicode code point that does not have the property "letter". Can be used inside character classes.\P{L} matches ©Unfortunately, Unicode brings its own requirements and pitfalls when it comes to regular expressions. Of the regex flavors discussed in this tutorial, Java and the .NET framework use Unicode-based regex engines. Perl supports Unicode starting with version 5.6.PCRE? Link to comment Share on other sites More sharing options...
Uten Posted October 31, 2006 Author Share Posted October 31, 2006 I think the unicode part of PCRE was excluded at compile time to shave off some kbytes and reasoned with the missing unicode support in AutoIt (due to compatibility with win95).I'll take another look at the switches you have provided @MHz to see if I can get any of them to work.Thanks Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Uten Posted October 31, 2006 Author Share Posted October 31, 2006 Rewrote my test to work with thomasl's perl regexp udfs and the regexp patterns I have used did not work well there either.Looks like using \b is a no!no! Or it does not work as I expect (think it shall).But this returns the expected letter å as the only match. Have to play a bit more with this to see if I can get words containing the letters.$regexp = "[^\w]([æøå]{1})[^\w]"Still open for suggestions thought Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now