Sign in to follow this  
Followers 0
sugi

Stringregexp Problem

6 posts in this topic

Hello,

according to the helpfile, the expression "<.*>" in a regexp means to match anything between the first "<" and the last ">" in the string (e.g. in "<abc><def>" it matches "<abc><def>"). To get the smallest possible match, a ? should be added after the repeating character, so now we have "<.*?>". In my example that should only match "<abc>" from the string "<abc><def>" as "<abc>" is the smallest possible match.

Now to my problem. As the "*" means 0 or more characters, a regexp of "<.*?>" should match "<>" as that is still the smallest possible match. After all ".*?" means "find the smallest match from 0 or more characters".

I've tested this with the following code:

MsgBox(64, 'Match', StringRegExp('<>', '<.*?>', 0))
MsgBox(64, 'Match', StringRegExp('<a>', '<.*?>', 0))
In my opinion both should return the same result as the regexp should match both. But the first one does not match for some reason I don't understand.

Any ideas where I understood something wrong or where the bug in my code is?

Share this post


Link to post
Share on other sites



Hello,

according to the helpfile, the expression "<.*>" in a regexp means to match anything between the first "<" and the last ">" in the string (e.g. in "<abc><def>" it matches "<abc><def>"). To get the smallest possible match, a ? should be added after the repeating character, so now we have "<.*?>". In my example that should only match "<abc>" from the string "<abc><def>" as "<abc>" is the smallest possible match.

Now to my problem. As the "*" means 0 or more characters, a regexp of "<.*?>" should match "<>" as that is still the smallest possible match. After all ".*?" means "find the smallest match from 0 or more characters".

I've tested this with the following code:

MsgBox(64, 'Match', StringRegExp('<>', '<.*?>', 0))
MsgBox(64, 'Match', StringRegExp('<a>', '<.*?>', 0))
In my opinion both should return the same result as the regexp should match both. But the first one does not match for some reason I don't understand.

Any ideas where I understood something wrong or where the bug in my code is?

From the help file:

. Match any single character

* Repeat the previous character, set or group 0 or more times. Equivalent to {0,}

So there needs to be at least one character inside the <> to match because the * is only a repetition of the first character match, which is any single character.

Share this post


Link to post
Share on other sites

So, to find just <> you could add a line something like:

if stringinstr($mystring,"<>") <> 0 then $myanswer = "<>"

...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

From the help file:

. Match any single character

* Repeat the previous character, set or group 0 or more times. Equivalent to {0,}

So there needs to be at least one character inside the <> to match because the * is only a repetition of the first character match, which is any single character.

Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it:

MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0))

The common regexp syntax for the * is: the last character (in my example the B) may exist 0 or more times. So of course all three strings are matched.

You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings.

Thanks jefhal for the idea, I'm already using that as a workaround. Before I posted I checked my regexp with The RegEx Coach and it told me it should match. But I posted here to find out if I overlooked something or AutoIt has a bug as I want to help to get it as bugfree as possible.

EDIT: Just found this thread, so the current behaviour is a bug, not a feature.

Edited by sugi

Share this post


Link to post
Share on other sites

Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it:

MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0))

The common regexp syntax for the * is: the last character (in my example the :) may exist 0 or more times. So of course all three strings are matched.

You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings.

Thanks jefhal for the idea, I'm already using that as a workaround. Before I posted I checked my regexp with The RegEx Coach and it told me it should match. But I posted here to find out if I overlooked something or AutoIt has a bug as I want to help to get it as bugfree as possible.

EDIT: Just found this thread, so the current behaviour is a bug, not a feature.

Share this post


Link to post
Share on other sites

Then we also got a bug in the documentation (if AutoIt wants to support the common RegExp syntax), or a bug in the current implementation. Have a look at this, if your interpretation is right, the last string does not match because there's no b in it:

MsgBox(64, 'Match', StringRegExp('abbc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('abc', 'ab*c', 0))
MsgBox(64, 'Match', StringRegExp('ac', 'ab*c', 0))

The common regexp syntax for the * is: the last character (in my example the :) may exist 0 or more times. So of course all three strings are matched.

You're mixing the * with the +. The + means that the last character may exist 1 or more times, which would only match the first two strings.

According to the AutoIT documentation, I am not mixing up the * and the +, whether it is a bug in the docs or the AutoIt implementation of regex, the docs are pretty much what any of us have to go on.

I can tell you that your first example is correct in PERL (whose regex syntax most others are based on), it will also match the empty <>

PERL RE syntax:

. Match any character (except newline)

*? Match 0 or more times

+? Match 1 or more times

and you would be correct with the statement "find the smallest match from 0 or more characters"

This is a bit different than is documented in the AutoIT docs, so maybe if it is not doing what you want, it is because the AutoIT implementation is as stated in the docs and not standard RE syntax.

I was just trying to point out the difference as I understand it from AUIT help file. :mellow:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0