sugi Posted March 20, 2006 Share Posted March 20, 2006 Hello,according to the helpfile, the expression "<.*>" in a regexp means to match anything between the first "<" and the last ">" in the string (e.g. in "<abc><def>" it matches "<abc><def>"). To get the smallest possible match, a ? should be added after the repeating character, so now we have "<.*?>". In my example that should only match "<abc>" from the string "<abc><def>" as "<abc>" is the smallest possible match.Now to my problem. As the "*" means 0 or more characters, a regexp of "<.*?>" should match "<>" as that is still the smallest possible match. After all ".*?" means "find the smallest match from 0 or more characters".I've tested this with the following code:MsgBox(64, 'Match', StringRegExp('<>', '<.*?>', 0)) MsgBox(64, 'Match', StringRegExp('<a>', '<.*?>', 0))In my opinion both should return the same result as the regexp should match both. But the first one does not match for some reason I don't understand.Any ideas where I understood something wrong or where the bug in my code is? Link to comment Share on other sites More sharing options...
billmez Posted March 20, 2006 Share Posted March 20, 2006 Hello, according to the helpfile, the expression "<.*>" in a regexp means to match anything between the first "<" and the last ">" in the string (e.g. in "<abc><def>" it matches "<abc><def>"). To get the smallest possible match, a ? should be added after the repeating character, so now we have "<.*?>". In my example that should only match "<abc>" from the string "<abc><def>" as "<abc>" is the smallest possible match. Now to my problem. As the "*" means 0 or more characters, a regexp of "<.*?>" should match "<>" as that is still the smallest possible match. After all ".*?" means "find the smallest match from 0 or more characters". I've tested this with the following code: MsgBox(64, 'Match', StringRegExp('<>', '<.*?>', 0)) MsgBox(64, 'Match', StringRegExp('<a>', '<.*?>', 0))In my opinion both should return the same result as the regexp should match both. But the first one does not match for some reason I don't understand. Any ideas where I understood something wrong or where the bug in my code is? From the help file: . Match any single character * Repeat the previous character, set or group 0 or more times. Equivalent to {0,} So there needs to be at least one character inside the <> to match because the * is only a repetition of the first character match, which is any single character. Link to comment Share on other sites More sharing options...
jefhal Posted March 20, 2006 Share Posted March 20, 2006 So, to find just <> you could add a line something like: if stringinstr($mystring,"<>") <> 0 then $myanswer = "<>" ...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format Link to comment Share on other sites More sharing options...
sugi Posted March 20, 2006 Author Share Posted March 20, 2006 (edited) From the help file: . Match any single character * Repeat the previous character, set or group 0 or more times. Equivalent to {0,} So there needs to be at least one character inside the <> to match because the * is only a repetition of the first character match, which is any single character.Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it: MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0)) The common regexp syntax for the * is: the last character (in my example the may exist 0 or more times. So of course all three strings are matched. You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings. Thanks jefhal for the idea, I'm already using that as a workaround. Before I posted I checked my regexp with The RegEx Coach and it told me it should match. But I posted here to find out if I overlooked something or AutoIt has a bug as I want to help to get it as bugfree as possible. EDIT: Just found this thread, so the current behaviour is a bug, not a feature. Edited March 20, 2006 by sugi Link to comment Share on other sites More sharing options...
billmez Posted March 21, 2006 Share Posted March 21, 2006 Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it: MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0)) The common regexp syntax for the * is: the last character (in my example the may exist 0 or more times. So of course all three strings are matched. You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings. Thanks jefhal for the idea, I'm already using that as a workaround. Before I posted I checked my regexp with The RegEx Coach and it told me it should match. But I posted here to find out if I overlooked something or AutoIt has a bug as I want to help to get it as bugfree as possible. EDIT: Just found this thread, so the current behaviour is a bug, not a feature. Link to comment Share on other sites More sharing options...
billmez Posted March 21, 2006 Share Posted March 21, 2006 Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it: MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0)) MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0)) The common regexp syntax for the * is: the last character (in my example the may exist 0 or more times. So of course all three strings are matched. You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings. According to the AutoIT documentation, I am not mixing up the * and the +, whether it is a bug in the docs or the AutoIt implementation of regex, the docs are pretty much what any of us have to go on. I can tell you that your first example is correct in PERL (whose regex syntax most others are based on), it will also match the empty <> PERL RE syntax: . Match any character (except newline) *? Match 0 or more times +? Match 1 or more times and you would be correct with the statement "find the smallest match from 0 or more characters" This is a bit different than is documented in the AutoIT docs, so maybe if it is not doing what you want, it is because the AutoIT implementation is as stated in the docs and not standard RE syntax. I was just trying to point out the difference as I understand it from AUIT help file. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now