Arithmetic needed, about searching a string which contains none Latin char

chenxu · May 2, 2008

How to get the count of none Latin char in a string which may contain Latin or none Latin char? O(1) or lg(n) needed, O(n) is solved.

ProgAndy · May 2, 2008

You could make a String with all latin Chars and then check each char with this string

$latin = "ABCDE....."
$String = "A String %&32"
$Split = StringSplit($String,"") ; Eyery single letter
$NonLatin = 0
For $i = 1 To $Split[0]
If Not StringinStr($latin,$Split[$i]) Then $NonLation += 1
Next
MsgBox(0,"",$NonLatin)

weaponx · May 2, 2008

This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas.

$MixedString = "ABCDEYíäýñíèé"
$NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3)

If IsArray($NonLatinCharArray) Then
    $count = Ubound($NonLatinCharArray)
    
    For $X = 0 to $count -1
        ConsoleWrite($NonLatinCharArray[$X] & @CRLF)
    Next
    
    MsgBox(0,"","Found " & $count & " Non-Latin characters ")
EndIf

Edited May 2, 2008 by weaponx

chenxu · May 2, 2008

This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas.

$MixedString = "ABCDEYíäýñíèé"
$NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3)

If IsArray($NonLatinCharArray) Then
    $count = Ubound($NonLatinCharArray)
    
    For $X = 0 to $count -1
        ConsoleWrite($NonLatinCharArray[$X] & @CRLF)
    Next
    
    MsgBox(0,"","Found " & $count & " Non-Latin characters ")
EndIfoÝ÷ Ûú®¢×ç(uçÚWß{kÊØb²h «^uúè*.Êy«¢+ØÀÌØí5¥áMÑÉ¥¹ôÅÕ½Ðì¨èÀäÈíIÀäÈíÕÑ¥±ÌÀäÈíÑµÀÅÕ½Ðì)5Í  ½à À°ÅÕ½ÐìÅÕ½Ðì°½Õ¹Ñ
¡¥¹Í
¡È ÀÌØí5¥áMÑÉ¥¹¤¤)Õ¹½Õ¹Ñ
¡¥¹Í
¡È ÀÌØíÍÑÈ¤(%1½°ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉäôMÑÉ¥¹IáÀ ÀÌØíÍÑÈ°ÅÕ½Ðímylé±¹Õ´éutÅÕ½Ðì°Ì¤(%%9½Ð%ÍÉÉä ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉä¤Q¡¸IÑÕÉ¸À(%IÑÕÉ¸U½Õ¹ ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉä¤)¹Õ¹

Edited May 2, 2008 by chenxu

Siao · May 2, 2008

The code failed!

That's because you failed twice:

1) to specify exactly what you want and stick with it (first you just said "none latin", and now you expect "chinese only")

2) to comprehend what weaponx said right above his code example.

Anyway, my version:

$s = ClipGet()
$a = StringSplit($s, "")
$iNonLatin = 0
For $i = 1 To $a[0]
    If AscW($a[$i]) >= 0x250 Then  $iNonLatin += 1
Next
ConsoleWrite($iNonLatin & @CRLF)

Short explanation of the above:

Range 0-0x24F includes Basic Latin, Latin-1, Latin Extended-A and Latin Extended-B; any char that doesn't fall within it will be counted. This doesn't really guarantee that only alpha chars will be counted, for example, Spacing Modifier Letters subset (0x2B0-0x2FF) would be counted too, so

go to unicode.org or wherever to get range charts, and tweak the code as you need.

Edited May 2, 2008 by Siao

chenxu · May 2, 2008

$s = ClipGet()
$a = StringSplit($s, "")
$iNonLatin = 0
For $i = 1 To $a[0]
    If AscW($a[$i]) >= 0x250 Then  $iNonLatin += 1
Next
ConsoleWrite($iNonLatin & @CRLF)

This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility.

Any way, thank you very much.

weaponx · May 2, 2008

This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility.
Any way, thank you very much.

This isn't math. The longer your string is, the longer it will take. There should be a linear increase in time with the length, just like the relationship between my stress level and the length of this thread.

Siao · May 2, 2008

Exactly. I would really like to know how O(1) can be expected trying to count characters in a string.

@chenxu:

These would be faster than StringSplit approach (and it has nothing to do with your dubious understanding of big O, just the fact that compiled code is much faster than script code):

StringRegExpReplace($s, "[^\x00-\x{24F}]", "")
    $iNonLatin = @extended

if you expect most characters in a string to be in the specified range

or

$stmp = StringRegExpReplace($s, "[\x00-\x{24F}]", "")
    $iNonLatin = StringLen($stmp)

if you expect most characters in a string to be outside the range

Again, I'm suggesting that the 0-0x24F used here likely is not exactly what you need, so you should adjust as necessary.

If you want to count Chinese only, the tricky part is the multitude of possible Chinese charsets.

Edited May 2, 2008 by Siao

Sign In

Arithmetic needed, about searching a string which contains none Latin char

Recommended Posts

chenxu

Link to comment

Share on other sites

ProgAndy

Link to comment

Share on other sites

weaponx

Link to comment

Share on other sites

chenxu

Link to comment

Share on other sites

Siao

Link to comment

Share on other sites

chenxu

Link to comment

Share on other sites

weaponx

Link to comment

Share on other sites

Siao

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta