leuce Posted May 9, 2020 Posted May 9, 2020 (edited) Hello everyone I'm having difficulty in understanding lazy vs greedy matching. The help file for StringRegExp tells me regex in AutoIt is always greedy unless you tell it to be lazy, but I seem to be getting the opposite effect: when I add (?U), it matches *more*, not less. $mystring = ' extype="myEXTYPE" match-quality="100%" origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE"' $one = StringRegExp ($mystring, '(?U)(origin=")(.+?)(")', 1) ; uses (?U), so it's lazy So, I expect $one[0] >> 1 $one[1] >> origin="myORIGIN" Instead, I get $one[0] >> origin=" $one[1] >> myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE $mystring = ' extype="myEXTYPE" match-quality="100%" origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE"' $two = StringRegExp ($mystring, '(origin=")(.+?)(")', 1) ; does NOT use (?U), so it's greedy So, I expect $two[0] >> 1 $two[1] >> origin="myORIGIN" xtm:id="myXTMID" xtm:project="myXTMPROJECT" xtm:changedby="myXTMCHANGEDBY" xtm:changedate="myXTMCHANGEDDATE" Instead, I get $two[0] >> origin=" $two[1] >> myORIGIN Also, the fact that I get: $two[0] >> origin=" makes NO sense to me. It's supposed to give me an array of matches, so (depending on what kind of array is created -- the helpfile for StringRegExp doesn't say), [0] must be either "1" or [0] must be the first item in the array, and the way I understand the helpfile for StringRegExp, first item in the array is supposed to be: origin="myORIGIN" Samuel Edited May 9, 2020 by leuce
mikell Posted May 9, 2020 Posted May 9, 2020 1 hour ago, leuce said: '(?U)(origin=")(.+?)(")', 1) ; uses (?U), so it's lazy Hmmm no. .+ is greedy (will get all chars up to the last quote in the text) while .+? is lazy (will get all chars up to the next quote) (?U) reverses this, I personally never use it because it's confusing (not needed, really...) So this StringRegExp ($mystring, 'origin="(.+?)"', 1) will give you an array which contains 1 match only : myORIGIN , because there is one capturing group only You might also use $myarray = StringRegExp ($mystring, 'origin="([^"]+)', 1) to get "one or more non-quote characters right after the string origin=" Was it clear ? genius257 and leuce 1 1
leuce Posted May 9, 2020 Author Posted May 9, 2020 Thanks, Mikell, for explaining it. Also thanks for the comment about "one capturing group" -- it solved another mystery for me.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now