Jump to content
Sign in to follow this  
kylomas

SRE - adding chars to pattern produces invalid results

Recommended Posts

kylomas

In the following code the first SRE produces what I am looking for but the second does not. It seems like it should produce the same result.

#include <array.au3>

; This works

local $str = '<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>'
$aTemp = stringregexp($str,'>(.*?)<',3)
_arraydisplay($aTemp,'Without /a')

; This returns too much

local $str = '<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>'
$aTemp = stringregexp($str,'>(.*?)</a',3)   ; added /a to pattern
_arraydisplay($aTemp,'With /a')

Can someone explain why, please?

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
AZJIO

1.

<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>

2.

<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>

Share this post


Link to post
Share on other sites
kylomas

AZJIO,

Yes, I see that, but why?

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
AZJIO

Because you're so requested

All according to the requesting

$aTemp = StringRegExp($str, '>([\d\s]+?)</a', 3)
$aTemp = StringRegExp($str, '>([^<>]+?)</a', 3)
Edited by AZJIO

Share this post


Link to post
Share on other sites
kylomas

AZJIO,

Yes, my apologies for such a vague question...these SRE's are frustrating the crap out of me.

The question that I hould have asked is, "why does the second pattern not match on the bold string below?

<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
BrewManNH

Because it matches from the first GT symbol, to the </a. It's not going to use the last >, it's going to start matching from the first.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites
kylomas

Because it matches from the first GT symbol, to the </a. It's not going to use the last >, it's going to start matching from the first.

@BrewmanNH - Then why doesn't it do the same thing with the first pattern?

edit: AZJIO - thanks for the SRE you posted, but I will never undertand this if I don't undertand why my pattern did NOT work...

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
kylomas

You guys ever have this dream where everyone get's it or is included, except you?

I'm having that dream now, maybe when I go to sleep the whole shit will make sense!

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
AZJIO

[ds] - allows only numbers and spaces

There are no numbers "<a src". So the following search.

Share this post


Link to post
Share on other sites
kylomas

[ds] - allows only numbers and spaces

There are no numbers "<a src". So the following search.

I get why you patterns work, I do NOT get why this '>(.*?)</a' does'nt.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
AZJIO

Finds the first character >

.*? - search of all that will meet on the way

stop during a meeting with this </a

Share this post


Link to post
Share on other sites
kylomas

OK, but does'nt the '?' make it non-greedy (matching smallest possible group)?


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
AZJIO

Yes, without the symbol "?" will be searched to the last character </a

But he is in one copy

Share this post


Link to post
Share on other sites
kylomas

For me, both these patterns produce the same results:

'>(.*?)</a'

'>(.*)</a'

The above is incorrect, but the question still remains...

I thought the "?" would make the pattern non-greedy so it would match the smallest match possible, e.g.

<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
kylomas

AZJIO,

Thank you for trying to help me. I am missing something fundamental to SRE and wasting your time. I'll come back to this later, after I've read more PCRE doc.

Again, thanks,

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
PhoenixXL
QuickTip
  • Use Flag 4 for debugging when your regex doesn't fit your expectations
#include <array.au3>

Local $str = '<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>'
$aTemp = StringRegExp($str, '>(.*?)</a', 4) ; added /a to pattern

;Lets check what happened
For $i = 0 To UBound($aTemp) - 1
$l = $aTemp[$i]
_ArrayDisplay($l, 'With /a')
Next

;Hope you understand what happened
Regards :)

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
PhoenixXL

Workaround

#include <array.au3>

Local $str = '<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>'
$aTemp = StringRegExp($str, '.*>(.*?)</a', 3) ; added /a to pattern

;The Greedy part will capture everything, then upon back-tracing the last match would be found
_ArrayDisplay($aTemp)
Regards :)

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
kylomas

;The Greedy part will capture everything, then upon back-tracing the last match would be found

Exactly what I don't understand...Why does'nt '>(.*?)</a' find the smallest match?

'<img width="20" height="22"> <a src="/icons/text.gif" alt="[TXT]" 2732618_readme.htm href="2790817%202732618_readme.htm"> 2790817 </a>'


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×