MFerris Posted November 28, 2006 Posted November 28, 2006 Hi there - I'm trying to detect non-"Western" characters in a string of characters - specifically, ascii codes 0-127. Based on the documentation for StringRegExp, I should be able to use the class [:ascii:] - however, this is not working. Consider the following script: #include <Array.au3> $string = "This is a regular string." $string2 = "This string has code 128 in it and should trigger non-ascii: Ç" $testString1 = StringRegExp($string,"[^:ascii:]+") $testString2 = StringRegExp($string2,"[^:ascii:]+") if $testString1 Then MsgBox(0,"","String 1 has a non-ascii character.") if $teststring2 Then MsgBox(0,"","String 2 has a non-ascii character.") $testString1Array = StringRegExp($string,"([^:ascii:]+)",3) $testString2Array = StringRegExp($string2,"([^:ascii:]+)",3) _ArrayDisplay($testString1Array,"all-ascii string") _ArrayDisplay($testString2Array,"string w non-ascii character") Based on that, I should not see the first msgbox, and I should see the second. (I'm searching both strings for 1 or more characters that are NOT ascii codes 0-127. However, I get the msgbox for the first string. In the second section, I'm writing all parts of the string to an array - this definitely tells me that this is not working. It is actually looking for characters 'a', 's', 'c', and 'i'. The array returned is anything in between those 4 characters. I'm somewhat new to using RegExp, but I think I have this right. Does [:ascii:] not work as a class, or am I implementing this wrong? Thanks for any help!
Moderators SmOke_N Posted November 28, 2006 Moderators Posted November 28, 2006 I'm not sure those groups are allowed... does it say they are in the help file? Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
Uten Posted November 28, 2006 Posted November 28, 2006 Don't know about [:ascii:] (you will have to take alook at the pcre documentation I think) There is also the non existing unicode thing with AutoIt (to be usable on win9x) But maybe you can rewrite yor patter to somthing like this? Func testNonASCII() $string = "This is a regular string." $string2 = "This string has code 128 in it and should trigger non-ascii: Ç" $rp1 = "[^a-zA-Z0-9_\.\:\- ]+";"[^:ascii:]+" $rp2 = "[^a-zA-Z0-9_\.\:\- ]+";"[^:ascii:]+" $testString1 = StringRegExp($string, $rp1) $testString2 = StringRegExp($string2,$rp2) if $testString1 Then MsgBox(0,"","String 1 has a non-ascii character.") if $teststring2 Then MsgBox(0,"","String 2 has a non-ascii character.") $testString1Array = StringRegExp($string,"(" & $rp1 & ")",3) $testString2Array = StringRegExp($string2,"(" & $rp2 & ")",3) _ArrayDisplay($testString1Array,"all-ascii string") _ArrayDisplay($testString2Array,"string w non-ascii character") EndFunc Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling
MFerris Posted November 28, 2006 Author Posted November 28, 2006 I'm not sure those groups are allowed... does it say they are in the help file?Yes, in the help file for StringRegExp it lists a number of new classes, which weren't in the older version of the help file. other classes are 'alnum', 'alpha', 'digit', 'upper', 'word', etc.. Since it does work, I guess it needs to be revised in the help file.
MFerris Posted November 28, 2006 Author Posted November 28, 2006 Don't know about [:ascii:] (you will have to take alook at the pcre documentation I think) There is also the non existing unicode thing with AutoIt (to be usable on win9x) But maybe you can rewrite yor patter to somthing like this? Yes, I think I'll have to go that way if :ascii: doesn't really work. I'm actually parsing html, so I'll have to also take into account everything that goes with that. Just seems like :ascii: would be the optimal choice (assuming it worked).
jpm Posted November 28, 2006 Posted November 28, 2006 Yes, I think I'll have to go that way if :ascii: doesn't really work. I'm actually parsing html, so I'll have to also take into account everything that goes with that. Just seems like :ascii: would be the optimal choice (assuming it worked).I assume you did a test with pcretest.exe. If not you must do it.In fact I did. unless Jon did a bad job this module react the same as autoIt code. So I assume that something must be wrong with the pattern you are using. I cannot imagine pcretest.exe being wrong...I am not an expert in RegularExpression, pcretest.exe IS
MFerris Posted November 28, 2006 Author Posted November 28, 2006 I assume you did a test with pcretest.exe. If not you must do it. In fact I did. unless Jon did a bad job this module react the same as autoIt code. So I assume that something must be wrong with the pattern you are using. I cannot imagine pcretest.exe being wrong... I am not an expert in RegularExpression, pcretest.exe IS No, I have not even heard of pcretest. I just did a google and most results seem to relate to a perl-based regexp testing program, however the syntax for that program seems different than that of how it is used in AutoIt. Or maybe I just need more coffee and time to digest the instructions. ... after a little more googling, I did find this page which shows the character classes in PERL which includes :ascii:. The only difference between AutoIt and the PERL syntax is that to do a not is "^:ascii:" (autoit) and ":^ascii:" (perl). I've tried both in Autoit, neither seem to work. However, after looking through the help file, I realized that what I need can be accomplished much more easily using StringIsASCII(), which also check for ascii codes 0-127. This works perfectly, so I'll use that. I don't know that my regexp statement could be any more simple ([^:ascii:]), so I'm still not sure it's working properly. If :ascii: should work, I'd be interested in seeing how it should be properly implemented. Now that I have alternate means of accomplishing this task, this isn't a problem for me, but it may be for others in the future.
MHz Posted November 28, 2006 Posted November 28, 2006 No, I have not even heard of pcretest.pcretest can be found here.
jpm Posted November 28, 2006 Posted November 28, 2006 No, I have not even heard of pcretest. I just did a google and most results seem to relate to a perl-based regexp testing program, however the syntax for that program seems different than that of how it is used in AutoIt. Or maybe I just need more coffee and time to digest the instructions. ... after a little more googling, I did find this page which shows the character classes in PERL which includes :ascii:. The only difference between AutoIt and the PERL syntax is that to do a not is "^:ascii:" (autoit) and ":^ascii:" (perl). I've tried both in Autoit, neither seem to work. However, after looking through the help file, I realized that what I need can be accomplished much more easily using StringIsASCII(), which also check for ascii codes 0-127. This works perfectly, so I'll use that. I don't know that my regexp statement could be any more simple ([^:ascii:]), so I'm still not sure it's working properly. If :ascii: should work, I'd be interested in seeing how it should be properly implemented. Now that I have alternate means of accomplishing this task, this isn't a problem for me, but it may be for others in the future.I hope an PCRE expert can help you because as the result of pcretest and AutoIt. it looks like your expression is not so good. Jon did implement this PCRE porting in AutoIt, if both return the same, he will say NOBUG
Administrators Jon Posted November 28, 2006 Administrators Posted November 28, 2006 In the PCRE documentation I found the line about :ascii:ascii character codes 0 - 127 Deployment Blog: https://www.autoitconsulting.com/site/blog/ SCCM SDK Programming: https://www.autoitconsulting.com/site/sccm-sdk/
jpm Posted November 28, 2006 Posted November 28, 2006 In the PCRE documentation I found the line about :ascii:I saw the same. That's the reason of all my PM
MFerris Posted November 28, 2006 Author Posted November 28, 2006 I saw the same. That's the reason of all my PM I saw the same as well. I'm not sure what the point of that statement is, though.As I can use the StringIsASCII() function, I no longer have a problem, I just wanted to point out that the :ascii: class for RegExp in Autoit may not be working properly, based on my test. I don't know how to use pcretest so I can't confirm if it works in that environment or not. Someone familiar with pcretest and/or a better understanding of RegExp syntax may want to take a closer look. My concern was for future users who may need to use the :ascii: class and can't rely on StringIsASCII().Thanks for all your help.
jpm Posted November 28, 2006 Posted November 28, 2006 I saw the same as well. I'm not sure what the point of that statement is, though.As I can use the StringIsASCII() function, I no longer have a problem, I just wanted to point out that the :ascii: class for RegExp in Autoit may not be working properly, based on my test. I don't know how to use pcretest so I can't confirm if it works in that environment or not. Someone familiar with pcretest and/or a better understanding of RegExp syntax may want to take a closer look. My concern was for future users who may need to use the :ascii: class and can't rely on StringIsASCII().Thanks for all your help.I know your concern , I was answering Jon not You
jpm Posted June 17, 2007 Posted June 17, 2007 In fact I don't know why the pcre implementation is as such ^ negate the class, but only if the first character but see the detail doc
SiteMaze Posted February 12, 2008 Posted February 12, 2008 To replace non-alphanumeric characters: (notice the double square brackets)1. $temp = StringRegExpReplace($filename, '[^[:alnum:]+]', "")2. $temp = StringRegExpReplace($filename, '[[:alnum:]+]', "")3. $temp = StringRegExpReplace($filename, '[[:alnum:]]', "")(decided to post it here so that others can search it) Arsenal Football Fan Club in Singapore
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now