amin84 Posted July 1, 2012 Share Posted July 1, 2012 Hello, I'm seeing some strange behavior while doing StringReplace on some unicode characters. The example is very simple. I have an string and I want to either remove or StringReplace a certain character with nothing (""). Here are some unicode characters: "ZERO WIDTH NON-JOINER" or ChrW(8204) or U+200C "ZERO WIDTH JOINER" or ChrW(8205) or U+200D "ARABIC TATWEEL" or ChrW(1600) or U+0640 First two characters are invisible they just change the behavior of the previous and next characters (if you didn't know). Now let's say I have a generated string that might have all those characters in it and I only want to get rid of ChrW(8204) and ChrW(8205). After lots of testing I narrowed it down to: MsgBox(0,'',StringReplace(ChrW(8205)&ChrW(1600),ChrW(8205),'')) MsgBox(0,'',StringReplace(ChrW(8204)&ChrW(1600),ChrW(8204),'')) Both commands above will remove ChrW(1600) too. This might be a bug. Link to comment Share on other sites More sharing options...
jchd Posted July 1, 2012 Share Posted July 1, 2012 I'm not so sure it's a bug in AutoIt.What I suspect is that the function you use (StringReplace) merely wraps a native Windows Unicode function. Now the behavior of several Unicode codepoints like the ZWNJ and ZWJ you want to get rid of isn't as simple as you might think, at least in the hands of an actual Unicode-compliant function. See for example this article.In short I believe (but can't say for sure) that the underlying function applies Unicode-defined treatment to the string you supply and since both codepoints apply to codepoints leading to ligature in order to change the display rendered, the end effect is that the "meaningful" codepoint vanishes (not being part of any ligature then) in both of your examples.You may have more chance using StringRegExpReplace as PCRE doesn't give codepoints in subject and pattern strings their complex Unicode semantic, but simply treats them individually. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
trancexx Posted July 1, 2012 Share Posted July 1, 2012 No it's not AutoIt bug. Nor Windows bug. When you want exact character replacement with no mumbo jumbo you specify casesense parameter for StringReplace(). ...Being 1 of course. ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
amin84 Posted July 1, 2012 Author Share Posted July 1, 2012 jchd and trancexx. Tnx for the awesome info. Using StringReplace() with casesense works perfect. Tnx. Link to comment Share on other sites More sharing options...
amin84 Posted July 2, 2012 Author Share Posted July 2, 2012 I found some minor problems while using StringReplace() so I changed it to the command below and everything works PERFECT: $oChars = StringRegExpReplace($oChars,"["&ChrW(8204)&"]",'') Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now