Jump to content

RegEx lazy quantifier


DW1
 Share

Go to solution Solved by Factfinder,

Recommended Posts

I must have a misunderstanding on how lazy quantifiers work.

My expected return from below would be: "Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469"

But I'm picking up almost the full string.  I'm likely either using incorrect syntax or made a typo somewhere that I keep overlooking.  Any help would be great :)

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)
Link to comment
Share on other sites

  • Moderators

DW1,

This works for me: :)

Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>][<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>]'
Local $sExtract = StringRegExpReplace($sString, '.*href="(.*)">Yarp.*', "$1")
ConsoleWrite($sExtract & " - Extracted" & @CRLF)
ConsoleWrite("Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469 - Required" & @CRLF)
M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

  • Moderators

DW1,

You need a guru for that - and I certainly do not qualify! :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

You sell yourself short, sir!

I would have expected either of these to work:

Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
Local $aTemp = StringRegExp($sString, '(?U)href="(.*)">Yarp', 3)

 

but it seems that I cannot get it to return a lazy result, just the greedy result... I'm super confused about this, and am hoping somebody can teach me to fish here.  I have workarounds, but more than anything, I'd like to clear up my own confusion, as I'm likely doing something wrong.

Link to comment
Share on other sites

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="([^"\r\n]*)">Yarp', 3)
_ArrayDisplay($aTemp)
;or
$aTemp = StringRegExp($sString, 'href="?([^"\r\n\>\<]*)"?>Yarp', 3)
_ArrayDisplay($aTemp)

 

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="([^"\r\n]*)">Yarp', 3)
_ArrayDisplay($aTemp)
;or
$aTemp = StringRegExp($sString, 'href="?([^"\r\n\>\<]*)"?>Yarp', 3)
_ArrayDisplay($aTemp)

 

Ciao.

Thank you.  More valid workarounds.

I'm still trying to get somebody to teach me to fish here though on why the lazy quantifier isn't working the way I expect it to.  I am open to it being user error, I just want to know what the error is.

Link to comment
Share on other sites

wat works OK, is the pattern who is not OK, not the RegExp

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

so check href=" '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href=" and stops until ">Yarp

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

Your original script would work with a little change, instaed of (.*?) use ([^>]*?) like this:

$sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
$aTemp = StringRegExp($sString, 'href="([^>]*?)">Yarp', 1)
If IsArray($aTemp) Then MsgBox(0, "", $aTemp[0])
Link to comment
Share on other sites

wat works OK, is the pattern who is not OK, not the RegExp

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

so check href=" '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href=" and stops until ">Yarp

Ciao.

I understand that, and I have workarounds, however this doesn't address my question, as a lazy quantifier should be returning as little as possible while still matching, yet I'm still seeing the same result as a greedy quantifier.  That's what I'm hoping somebody can correct me on.

Link to comment
Share on other sites

To clarify for anybody wondering what I'm on about...

I have plenty of workarounds to accomplish my task.  What I am asking is why the lazy quantifier is not working as I thought it did in the following:

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

 

Yes, this should match the entire string, however, I thought that adding the "?" after the quantifier "*" would make the match lazy, and grab as little as possible to match the expression.  My question is, where is my syntax error or my misunderstanding.  I understand how all of the workarounds are working.  What I don't understand is why the lazy quantifier isn't working the way I thought it did.  As I said previously, this is likely just a misunderstanding of mine, or a syntax error, but if somebody could answer how to get the lazy quantifier to work in this scenario, I'd appreciate it.

I would expect a greedy quantifier (as much as possible while still matching) to return as it is in my above script:

Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] [<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] [<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469

 

I would expect a lazy quantifier (as little as possible while still matching) to return the following:

Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469
Link to comment
Share on other sites

I understand that, and I have workarounds, however this doesn't address my question, as a lazy quantifier should be returning as little as possible while still matching, yet I'm still seeing the same result as a greedy quantifier.  That's what I'm hoping somebody can correct me on.

yes right, but the '">Yarp' is already the first Match, so everything is ok

 

try

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Yarp</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, 'href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

or tell RegExp to find the last 'href='

#include <Array.au3>
Local $sString = '<TD colSpan=4><SPAN style="FONT-VARIANT: small-caps">Sub-Categories: [<A href="Categories.aspx?id=2b249c75-f666-424e-b555-bf9ee8f34152">Nope</A>] '
$sString &= '[<A href="Categories.aspx?id=584378e4-917a-4fce-a6ff-e75fb966e36f">Nay</A>] [<A href="Categories.aspx?id=042394b4-1c0e-42d9-a9ba-bca9ea1394b1">Nada</A>] '
$sString &= '[<A href="Categories.aspx?id=8d72025e-a23a-4c81-8174-d3fc4e1eb469">Yarp</A>] '
Local $aTemp = StringRegExp($sString, '.*href="(.*?)">Yarp', 3)
_ArrayDisplay($aTemp)

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

  • Solution

The script I suggested is not a workaround. It is the correct script.

Your script doesn't work because ? applies forward to string coming after href=" up to the first ">Yarp. So if you had a second ">Yarp  in the string the ? would make it match only up to the first ">Yarp.

As DXRW4E mentioned, your script start at the first href=" and ends at the first ">Yarp  because ? doesn't work backwards. To eliminate all the href="  in the string except the one preceding ">Yarp  you should use the script I suggested.

Edited by Factfinder
Link to comment
Share on other sites

because ? doesn't work backwards

This is what you and DXRW4E have both been pointing out to me, which is now clear.

I wasn't understanding why the capturing group was not lazy, because I wasn't putting together the fact that the "?" doesn't work backwards.  This makes perfect sense to me now, thank you both!

EDIT: Marking Factfinder's post as the solution, however DXRW4E, I understand you were pointing out the same thing to me, I just didn't get it until his post spelled it out for me.  Thank you both!

Edited by DW1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...