chenxu Posted May 2, 2008 Share Posted May 2, 2008 How to get the count of none Latin char in a string which may contain Latin or none Latin char? O(1) or lg(n) needed, O(n) is solved. Link to comment Share on other sites More sharing options...
ProgAndy Posted May 2, 2008 Share Posted May 2, 2008 You could make a String with all latin Chars and then check each char with this string $latin = "ABCDE....." $String = "A String %&32" $Split = StringSplit($String,"") ; Eyery single letter $NonLatin = 0 For $i = 1 To $Split[0] If Not StringinStr($latin,$Split[$i]) Then $NonLation += 1 Next MsgBox(0,"",$NonLatin) *GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes Link to comment Share on other sites More sharing options...
weaponx Posted May 2, 2008 Share Posted May 2, 2008 (edited) This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas. $MixedString = "ABCDEYíäýñíèé" $NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3) If IsArray($NonLatinCharArray) Then $count = Ubound($NonLatinCharArray) For $X = 0 to $count -1 ConsoleWrite($NonLatinCharArray[$X] & @CRLF) Next MsgBox(0,"","Found " & $count & " Non-Latin characters ") EndIf Edited May 2, 2008 by weaponx Link to comment Share on other sites More sharing options...
chenxu Posted May 2, 2008 Author Share Posted May 2, 2008 (edited) This will probably count non-alphanumeric characters along with the non-Latin characters but it should provide some ideas. $MixedString = "ABCDEYíäýñíèé" $NonLatinCharArray = StringRegExp($MixedString, "[^[:alnum:]]", 3) If IsArray($NonLatinCharArray) Then $count = Ubound($NonLatinCharArray) For $X = 0 to $count -1 ConsoleWrite($NonLatinCharArray[$X] & @CRLF) Next MsgBox(0,"","Found " & $count & " Non-Latin characters ") EndIfoÝ÷ Ûú®¢×ç(uçÚWß{kÊØb²h «^uúè*.Êy«¢+ØÀÌØí5¥áMÑÉ¥¹ôÅÕ½Ðì¨èÀäÈíIÀäÈíÕÑ¥±ÌÀäÈíѵÀÅÕ½Ðì)5Í ½à À°ÅÕ½ÐìÅÕ½Ðì°½Õ¹Ñ ¡¥¹Í ¡È ÀÌØí5¥áMÑÉ¥¹¤¤)Õ¹½Õ¹Ñ ¡¥¹Í ¡È ÀÌØíÍÑȤ(%1½°ÀÌØí9½¹1Ñ¥¹ ¡ÉÉÉäôMÑÉ¥¹IáÀ ÀÌØíÍÑÈ°ÅÕ½Ðímylé±¹Õ´éutÅÕ½Ðì°Ì¤(%%9½Ð%ÍÉÉä ÀÌØí9½¹1Ñ¥¹ ¡ÉÉÉä¤Q¡¸IÑÕɸÀ(%IÑÕɸU½Õ¹ ÀÌØí9½¹1Ñ¥¹ ¡ÉÉÉä¤)¹Õ¹ Edited May 2, 2008 by chenxu Link to comment Share on other sites More sharing options...
Siao Posted May 2, 2008 Share Posted May 2, 2008 (edited) The code failed! That's because you failed twice: 1) to specify exactly what you want and stick with it (first you just said "none latin", and now you expect "chinese only") 2) to comprehend what weaponx said right above his code example. Anyway, my version: $s = ClipGet() $a = StringSplit($s, "") $iNonLatin = 0 For $i = 1 To $a[0] If AscW($a[$i]) >= 0x250 Then $iNonLatin += 1 Next ConsoleWrite($iNonLatin & @CRLF) Short explanation of the above: Range 0-0x24F includes Basic Latin, Latin-1, Latin Extended-A and Latin Extended-B; any char that doesn't fall within it will be counted. This doesn't really guarantee that only alpha chars will be counted, for example, Spacing Modifier Letters subset (0x2B0-0x2FF) would be counted too, so go to unicode.org or wherever to get range charts, and tweak the code as you need. Edited May 2, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
chenxu Posted May 2, 2008 Author Share Posted May 2, 2008 $s = ClipGet() $a = StringSplit($s, "") $iNonLatin = 0 For $i = 1 To $a[0] If AscW($a[$i]) >= 0x250 Then $iNonLatin += 1 Next ConsoleWrite($iNonLatin & @CRLF)This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility. Any way, thank you very much. Link to comment Share on other sites More sharing options...
weaponx Posted May 2, 2008 Share Posted May 2, 2008 This code is sure do what I want, but, it takes O(n) time. I need to invoke the utility a lot in my script, so I need an O(1) or lg(n) time utility.Any way, thank you very much.This isn't math. The longer your string is, the longer it will take. There should be a linear increase in time with the length, just like the relationship between my stress level and the length of this thread. Link to comment Share on other sites More sharing options...
Siao Posted May 2, 2008 Share Posted May 2, 2008 (edited) Exactly. I would really like to know how O(1) can be expected trying to count characters in a string. @chenxu: These would be faster than StringSplit approach (and it has nothing to do with your dubious understanding of big O, just the fact that compiled code is much faster than script code): StringRegExpReplace($s, "[^\x00-\x{24F}]", "") $iNonLatin = @extended if you expect most characters in a string to be in the specified range or $stmp = StringRegExpReplace($s, "[\x00-\x{24F}]", "") $iNonLatin = StringLen($stmp) if you expect most characters in a string to be outside the range Again, I'm suggesting that the 0-0x24F used here likely is not exactly what you need, so you should adjust as necessary. If you want to count Chinese only, the tricky part is the multitude of possible Chinese charsets. Edited May 2, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now