Mithrandir Posted January 20, 2011 Share Posted January 20, 2011 (edited) I am doing some tests on regular expressions using this string:&element1&element2&element11&And when using this pattern (the \x26 is a way to get a character by its ascii code in this case 26 which is &):\A\x26[^\x26]+?It correctly matches '&e'But when using this pattern:[^\x26]+?\x26\zIt matches 'element11&' and not '1&' although I told it not to be greedy with the '?' after [^\x26]+What is happening? Edited January 21, 2011 by Mithrandir Help with SOAP message!! Link to comment Share on other sites More sharing options...
rudi Posted January 24, 2011 Share Posted January 24, 2011 (edited) Hi. What result are you expecting? Regards, Rudi. Edited January 24, 2011 by rudi Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
Varian Posted January 24, 2011 Share Posted January 24, 2011 (edited) "A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match." You used an anchor and a negated class. Your pattern basically says: "Read from right to left macthing the pattern "&" and "not an & matching as few as possible at first, gradually expanding the match"From Right to Left:& - a match for /x26 (token #2)1 - a match for a character other than /x26 (token #1)1 - a match for a character other than /x26 (token #1)t - a match for a character other than /x26 (token #1)n - a match for a character other than /x26 (token #1)e - a match for a character other than /x26 (token #1)m - a match for a character other than /x26 (token #1)e - a match for a character other than /x26 (token #1)l - a match for a character other than /x26 (token #1)e - a match for a character other than /x26 (token #1)& - Not a match for a character other than /x26 (token #1)Return element11& Edited January 24, 2011 by Varian Link to comment Share on other sites More sharing options...
Mithrandir Posted January 28, 2011 Author Share Posted January 28, 2011 (edited) Hi. What result are you expecting? Regards, Rudi. I was expecting the last regexp to return 1& since I believed I told it to match an ampersand ("&") at the end and then not to be greedy when matching elements that were not an ampersand. "A lazy quantifier will first repeat the token as few times as required, and gradually expand the match as the engine backtracks through the regex to find an overall match." You used an anchor and a negated class. Your pattern basically says: "Read from right to left macthing the pattern "&" and "not an & matching as few as possible at first, gradually expanding the match" From Right to Left: & - a match for /x26 (token #2) 1 - a match for a character other than /x26 (token #1) 1 - a match for a character other than /x26 (token #1) t - a match for a character other than /x26 (token #1) n - a match for a character other than /x26 (token #1) e - a match for a character other than /x26 (token #1) m - a match for a character other than /x26 (token #1) e - a match for a character other than /x26 (token #1) l - a match for a character other than /x26 (token #1) e - a match for a character other than /x26 (token #1) & - Not a match for a character other than /x26 (token #1) Return element11& Great explanation! So in order to match the last non-ampersand character and the last ampersand I used this pattern: [^\x26]{1}\x26\z and it worked. But does regexp always read from right to left or is it only when using \z in the pattern? Because if it is so, then why when using the pattern \A\x26[^\x26]+? it matched '&e' ? Shouldn't, if reading from right to left, do this?(I skipped the parsing of '&element2&element11&' because they would end when they match an '&' that is not at the beginning of the string): From Right to Left: 1 - a match for a character other than /x26 (token #2) t - a match for a character other than /x26 (token #2) n - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) m - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) l - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) & - a match for an ampersand at the beginning of the string (token #1) Return &element1 On the other hand if it is from left to right it would match the ampersand at the beginning of the string and then a non-ampersand character and stop there. So is it this 'right to left' reading method always used or just when using \z ? Thanks for your help! Edited January 28, 2011 by Mithrandir Help with SOAP message!! Link to comment Share on other sites More sharing options...
Varian Posted January 28, 2011 Share Posted January 28, 2011 (edited) I was expecting the last regexp to return 1& since I believed I told it to match an ampersand ("&") at the end and then not to be greedy when matching elements that were not an ampersand. Great explanation! So in order to match the last non-ampersand character and the last ampersand I used this pattern: [^\x26]{1}\x26\z and it worked. But does regexp always read from right to left or is it only when using \z in the pattern? Because if it is so, then why when using the pattern \A\x26[^\x26]+? it matched '&e' ? Shouldn't, if reading from right to left, do this?(I skipped the parsing of '&element2&element11&' because they would end when they match an '&' that is not at the beginning of the string): From Right to Left: 1 - a match for a character other than /x26 (token #2) t - a match for a character other than /x26 (token #2) n - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) m - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) l - a match for a character other than /x26 (token #2) e - a match for a character other than /x26 (token #2) & - a match for an ampersand at the beginning of the string (token #1) Return &element1 On the other hand if it is from left to right it would match the ampersand at the beginning of the string and then a non-ampersand character and stop there. So is it this 'right to left' reading method always used or just when using \z ? Thanks for your help! Sorry for the late reply, but you are correct. The /z or $ (same thing) denotes the end of line (or end of string) so the match will be tested with your other tokens an the by the end of the line (or string). A good example of this is extracting the containing path from a FQ (fully qualified) file or directory: For example: If the FQ Path is "C:\Windows\System32\regedit.exe", a RegExp of "[^\\]*$" will read from right to left and match everything that is not "\"...So StringRegExp("C:\Windows\System32\regedit.exe", "[^\\]*$", 1)will find a match of regedit.exe and StringRegExpReplace("C:\Windows\System32\regedit.exe", "[^\\]*$", "")will replace that match with blank, so it will return C:\Windows\System32\" Hope this helps Edited January 28, 2011 by Varian Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now