Sign in to follow this  
Followers 0

StringRegExp with /b and /b (unicode support):How to switch it on?

16 posts in this topic

Posted (edited)

Hi,

little example shows up what I mean. :)

$text = "I try ü ä ö working with 'Text' Wordtext"

If _checkWord($text, "'Text'") Then MsgBox(0, "", "'Text' found")
If _checkWord($text, "Text") Then MsgBox(0, "", "Text found")
If _checkWord($text, 'ü') Then MsgBox(0, "", "ü found")
If _checkWord($text, 'ä') Then MsgBox(0, "", "ä found")
If _checkWord($text, 'ö') Then MsgBox(0, "", "ü found")

Func _checkWord($string, $word)
    If StringRegExp($string, '\b' & $word & '\b', 0) Then Return 1
    Return 0
EndFunc   ;==>_checkWord

In Regexbuddy it works that way!

So long,

Mega

Edited by Xenobiologist

Share this post


Link to post
Share on other sites



Posted

In Regexbuddy it works that way!

What way? I don't know what you expect and what you don't expect. Are all those conditions supposed to return true? Are they all supposed to return false?

Share this post


Link to post
Share on other sites

Posted

What way? I don't know what you expect and what you don't expect. Are all those conditions supposed to return true? Are they all supposed to return false?

Hi,

:) of course not. Only the second call of _checkWord returns true. I expected that the third, fourth and fifth call also return true. (As they do in RegexBuddy.

I and wondered why it doesn't work with the first one, but this is another question. :P

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

Hi,

:) of course not. Only the second call of _checkWord returns true. I expected that the third, fourth and fifth call also return true. (As they do in RegexBuddy.

I and wondered why it doesn't work with the first one, but this is another question. :P

So long,

Mega

Full doc say

In UTF-8 mode, characters with values greater than 128 never match \d, \s, or \w, and always match \D, \S, and \W. This is true even when Unicode character property support is available. These sequences retain their original meanings from before UTF-8 support was available, mainly for efficiency reasons.

So I assume that the reason for all the 'non' matching

Share this post


Link to post
Share on other sites

Posted

Full doc say

So I assume that the reason for all the 'non' matching

Hi,

okay that seems to be the reason, but what is the solution?

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

Hi,

okay that seems to be the reason, but what is the solution?

So long,

Mega

no idea stringregexp is supposed to support UNICODE so (?i) "case insensitive" must do something

Share this post


Link to post
Share on other sites

Posted (edited)

Hi,

little example shows up what I mean. :)

$text = "I try ü ä ö working with 'Text' Wordtext"

If _checkWord($text, "'Text'") Then MsgBox(0, "", "'Text' found")
If _checkWord($text, "Text") Then MsgBox(0, "", "Text found")
If _checkWord($text, 'ü') Then MsgBox(0, "", "ü found")
If _checkWord($text, 'ä') Then MsgBox(0, "", "ä found")
If _checkWord($text, 'ö') Then MsgBox(0, "", "ü found")

Func _checkWord($string, $word)
    If StringRegExp($string, '\b' & $word & '\b', 0) Then Return 1
    Return 0
EndFunc   ;==>_checkWord
Edited by Bowmore

Share this post


Link to post
Share on other sites

Posted

Hi,

okay that works a bit better.

My reference is RegexBuddy and trying \b'Text'\b as a pattern doesn't match. In your func it matches.

So, there is still a difference.

Lets see whether somebody knows how to switch on unicode support.

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

Hi,

okay that works a bit better.

My reference is RegexBuddy and trying \b'Text'\b as a pattern doesn't match. In your func it matches.

So, there is still a difference.

Lets see whether somebody knows how to switch on unicode support.

So long,

Mega

I've just done a bit more testing and my script does not work it only appears to work due to my mistake. (?!\\p{L}) should be (?!\p{L}) and when changed to the correct value it does not work

Share this post


Link to post
Share on other sites

Posted

I check with Jon which answer that char >128 and unicode cannot have an insensitive handling.

NO FIX as the detailed doc is also saying that

Share this post


Link to post
Share on other sites

Posted

I check with Jon which answer that char >128 and unicode cannot have an insensitive handling.

NO FIX as the detailed doc is also saying that

Hi,

thanks, but sorry I don't understand what you are saying there. Jon told you that chars > 128 cannot be used for \b and so on? :)

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

Hi,

thanks, but sorry I don't understand what you are saying there. Jon told you that chars > 128 cannot be used for \b and so on? :)

So long,

Mega

it say as the doc say that \b will not match char>128. I was adding that the caseless will not work either. (?i) will not solve anything

Share this post


Link to post
Share on other sites

Posted

it say as the doc say that \b will not match char>128. I was adding that the caseless will not work either. (?i) will not solve anything

Hi,

so then this is no bug. It is a feature request :-)

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

Hi,

so then this is no bug. It is a feature request :-)

So long,

Mega

it looks like a restriction with the pcre.lib we are using (freeware)

I don't know if we can go around. I leave Jon to directly answer

Share this post


Link to post
Share on other sites

Posted

it looks like a restriction with the pcre.lib we are using (freeware)

I don't know if we can go around. I leave Jon to directly answer

HI,

okay, Jon can we go around? :)

So long,

Mega

Share this post


Link to post
Share on other sites

Posted

HI,

okay, Jon can we go around? :)

So long,

Mega

It's a bug in pcre - i don't know how to workaround it.

Share this post


Link to post
Share on other sites
Sign in to follow this  
Followers 0