Problem with regexp:It matches pattern ok at the beginning but not at the end

Mithrandir · January 20, 2011

I am doing some tests on regular expressions using this string:

&element1&element2&element11&

And when using this pattern (the \x26 is a way to get a character by its ascii code in this case 26 which is &):

\A\x26[^\x26]+?

It correctly matches '&e'

But when using this pattern:

[^\x26]+?\x26\z

It matches 'element11&' and not '1&' although I told it not to be greedy with the '?' after [^\x26]+

What is happening?

Edited January 21, 2011 by Mithrandir

rudi · January 24, 2011

Hi.

What result are you expecting?

Regards, Rudi.

Edited January 24, 2011 by rudi

Varian · January 24, 2011

"A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match." You used an anchor and a negated class. Your pattern basically says: "Read from right to left macthing the pattern "&" and "not an & matching as few as possible at first, gradually expanding the match"

From Right to Left:

& - a match for /x26 (token #2)

1 - a match for a character other than /x26 (token #1)

t - a match for a character other than /x26 (token #1)

n - a match for a character other than /x26 (token #1)

e - a match for a character other than /x26 (token #1)

m - a match for a character other than /x26 (token #1)

e - a match for a character other than /x26 (token #1)

l - a match for a character other than /x26 (token #1)

e - a match for a character other than /x26 (token #1)

& - Not a match for a character other than /x26 (token #1)

Return element11&

Edited January 24, 2011 by Varian

Mithrandir · January 28, 2011

Hi.

What result are you expecting?

Regards, Rudi.

I was expecting the last regexp to return 1& since I believed I told it to match an ampersand ("&") at the end and then not to be greedy when matching elements that were not an ampersand.

"A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match." You used an anchor and a negated class. Your pattern basically says: "Read from right to left macthing the pattern "&" and "not an & matching as few as possible at first, gradually expanding the match"

From Right to Left:
& - a match for /x26 (token #2)
1 - a match for a character other than /x26 (token #1)
1 - a match for a character other than /x26 (token #1)
t - a match for a character other than /x26 (token #1)
n - a match for a character other than /x26 (token #1)
e - a match for a character other than /x26 (token #1)
m - a match for a character other than /x26 (token #1)
e - a match for a character other than /x26 (token #1)
l - a match for a character other than /x26 (token #1)
e - a match for a character other than /x26 (token #1)
& - Not a match for a character other than /x26 (token #1)
Return element11&

Great explanation! So in order to match the last non-ampersand character and the last ampersand I used this pattern: [^\x26]{1}\x26\z and it worked.

But does regexp always read from right to left or is it only when using \z in the pattern? Because if it is so, then why when using the pattern

\A\x26[^\x26]+? it matched '&e' ? Shouldn't, if reading from right to left, do this?(I skipped the parsing of '&element2&element11&' because they would end when they match an '&' that is not at the beginning of the string):

From Right to Left:

1 - a match for a character other than /x26 (token #2)

t - a match for a character other than /x26 (token #2)

n - a match for a character other than /x26 (token #2)

e - a match for a character other than /x26 (token #2)

m - a match for a character other than /x26 (token #2)

e - a match for a character other than /x26 (token #2)

l - a match for a character other than /x26 (token #2)

e - a match for a character other than /x26 (token #2)

& - a match for an ampersand at the beginning of the string (token #1)

Return &element1

On the other hand if it is from left to right it would match the ampersand at the beginning of the string and then a non-ampersand character and stop there. So is it this 'right to left' reading method always used or just when using \z ? Thanks for your help!

Edited January 28, 2011 by Mithrandir

Varian · January 28, 2011

I was expecting the last regexp to return 1& since I believed I told it to match an ampersand ("&") at the end and then not to be greedy when matching elements that were not an ampersand.

Great explanation! So in order to match the last non-ampersand character and the last ampersand I used this pattern: [^\x26]{1}\x26\z and it worked.

But does regexp always read from right to left or is it only when using \z in the pattern? Because if it is so, then why when using the pattern
\A\x26[^\x26]+? it matched '&e' ? Shouldn't, if reading from right to left, do this?(I skipped the parsing of '&element2&element11&' because they would end when they match an '&' that is not at the beginning of the string):

From Right to Left:

1 - a match for a character other than /x26 (token #2)
t - a match for a character other than /x26 (token #2)
n - a match for a character other than /x26 (token #2)
e - a match for a character other than /x26 (token #2)
m - a match for a character other than /x26 (token #2)
e - a match for a character other than /x26 (token #2)
l - a match for a character other than /x26 (token #2)
e - a match for a character other than /x26 (token #2)
& - a match for an ampersand at the beginning of the string (token #1)
Return &element1

On the other hand if it is from left to right it would match the ampersand at the beginning of the string and then a non-ampersand character and stop there. So is it this 'right to left' reading method always used or just when using \z ? Thanks for your help!

Sorry for the late reply, but you are correct. The /z or $ (same thing) denotes the end of line (or end of string) so the match will be tested with your other tokens an the by the end of the line (or string). A good example of this is extracting the containing path from a FQ (fully qualified) file or directory: For example:

If the FQ Path is "C:\Windows\System32\regedit.exe", a RegExp of "[^\\]*$" will read from right to left and match everything that is not "\"...So

StringRegExp("C:\Windows\System32\regedit.exe", "[^\\]*$", 1)

will find a match of regedit.exe

and

StringRegExpReplace("C:\Windows\System32\regedit.exe", "[^\\]*$", "")

will replace that match with blank, so it will return C:\Windows\System32\"

Hope this helps

Edited January 28, 2011 by Varian

Sign In

Problem with regexp:It matches pattern ok at the beginning but not at the end

Recommended Posts

Mithrandir

rudi

Varian

Mithrandir

Varian

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta