Jump to content

Arithmetic needed, about searching a string which contains none Latin char


Recommended Posts

You could make a String with all latin Chars and then check each char with this string :)

$latin = "ABCDE....."
$String = "A String %&32"
$Split = StringSplit($String,"") ; Eyery single letter
$NonLatin = 0
For $i = 1 To $Split[0]
If Not StringinStr($latin,$Split[$i]) Then $NonLation += 1
Next
MsgBox(0,"",$NonLatin)

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Link to comment
Share on other sites

This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas.

$MixedString = "ABCDEYíäýñíèé"
$NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3)

If IsArray($NonLatinCharArray) Then
    $count = Ubound($NonLatinCharArray)
    
    For $X = 0 to $count -1
        ConsoleWrite($NonLatinCharArray[$X] & @CRLF)
    Next
    
    MsgBox(0,"","Found " & $count & " Non-Latin characters ")
EndIf
Edited by weaponx
Link to comment
Share on other sites

This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas.

$MixedString = "ABCDEYíäýñíèé"
$NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3)

If IsArray($NonLatinCharArray) Then
    $count = Ubound($NonLatinCharArray)
    
    For $X = 0 to $count -1
        ConsoleWrite($NonLatinCharArray[$X] & @CRLF)
    Next
    
    MsgBox(0,"","Found " & $count & " Non-Latin characters ")
EndIfoÝ÷ Ûú®¢×ç(uçÚWß{kÊØb²h ­«^uúè*.­Êy«­¢+ØÀÌØí5¥áMÑÉ¥¹ôÅÕ½Ðì¨èÀäÈíIÀäÈíÕÑ¥±ÌÀäÈíѵÀÅÕ½Ðì)5Í  ½à À°ÅÕ½ÐìÅÕ½Ðì°½Õ¹Ñ
¡¥¹Í
¡È ÀÌØí5¥áMÑÉ¥¹¤¤)Õ¹½Õ¹Ñ
¡¥¹Í
¡È ÀÌØíÍÑȤ(%1½°ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉäôMÑÉ¥¹IáÀ ÀÌØíÍÑÈ°ÅÕ½Ðímylé±¹Õ´éutÅÕ½Ðì°Ì¤(%%9½Ð%ÍÉÉä ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉä¤Q¡¸IÑÕɸÀ(%IÑÕɸU½Õ¹ ÀÌØí9½¹1Ñ¥¹
¡ÉÉÉä¤)¹Õ¹
Edited by chenxu
Link to comment
Share on other sites

The code failed!

That's because you failed twice:

1) to specify exactly what you want and stick with it (first you just said "none latin", and now you expect "chinese only")

2) to comprehend what weaponx said right above his code example.

Anyway, my version:

$s = ClipGet()
$a = StringSplit($s, "")
$iNonLatin = 0
For $i = 1 To $a[0]
    If AscW($a[$i]) >= 0x250 Then  $iNonLatin += 1
Next
ConsoleWrite($iNonLatin & @CRLF)

Short explanation of the above:

Range 0-0x24F includes Basic Latin, Latin-1, Latin Extended-A and Latin Extended-B; any char that doesn't fall within it will be counted. This doesn't really guarantee that only alpha chars will be counted, for example, Spacing Modifier Letters subset (0x2B0-0x2FF) would be counted too, so

go to unicode.org or wherever to get range charts, and tweak the code as you need.

Edited by Siao

"be smart, drink your wine"

Link to comment
Share on other sites

$s = ClipGet()
$a = StringSplit($s, "")
$iNonLatin = 0
For $i = 1 To $a[0]
    If AscW($a[$i]) >= 0x250 Then  $iNonLatin += 1
Next
ConsoleWrite($iNonLatin & @CRLF)
This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility.

Any way, thank you very much.

Link to comment
Share on other sites

This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility.

Any way, thank you very much.

This isn't math. The longer your string is, the longer it will take. There should be a linear increase in time with the length, just like the relationship between my stress level and the length of this thread.

Link to comment
Share on other sites

Exactly. I would really like to know how O(1) can be expected trying to count characters in a string.

@chenxu:

These would be faster than StringSplit approach (and it has nothing to do with your dubious understanding of big O, just the fact that compiled code is much faster than script code):

StringRegExpReplace($s, "[^\x00-\x{24F}]", "")
    $iNonLatin = @extended

if you expect most characters in a string to be in the specified range

or

$stmp = StringRegExpReplace($s, "[\x00-\x{24F}]", "")
    $iNonLatin = StringLen($stmp)

if you expect most characters in a string to be outside the range

Again, I'm suggesting that the 0-0x24F used here likely is not exactly what you need, so you should adjust as necessary.

If you want to count Chinese only, the tricky part is the multitude of possible Chinese charsets.

Edited by Siao

"be smart, drink your wine"

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...