Jump to content

StringRegExp problem with (.+)


Recommended Posts

Hello all.

I have problem with this

StringRegExp($sourceHTML, '<a href="karte.php\?d=(.+)">', 3)

If the $sourceHTML is:

<td colspan="11"><a href="spieler.php?uid=4303">matica2</a> iz naselbine <a href="karte.php?d=315256&c=ab">Naselje špranj</a></td>

Then the returned value of stringregexp is correct ... it shows "315256&c=ab".

Now the problem is here:

Value of $sourceHTML:

<td class="role">Branilec</td><td colspan="10"><a href="spieler.php?uid=3077">matt23</a> iz naselbine <a href="karte.php?d=315251&c=eb">NaSeLjE</a></td></tr></thead><tbody class="units"><tr><th> </th><td><img src="img/x.gif" class="unit u11" title="Gorjačar" alt="Gorjačar" />

Now I get result

315251&c=eb">NaSeLjE</a></td></tr></thead><tbody class="units"><tr><th> </th><td><img src="img/x.gif" class="unit u11" title="Gorjačar" alt="Gorjačar" /> ... etc about 2500 chars.

So why this happen?

Please help ...

Edit: I see that (.+) is matching "> at the end of line ... what can I do :) this link is different every time ..

Edit2: OK, I solved it with this:

StringRegExp($preberzameta, '<a href="karte.php\?d=(\d+)&(....)">', 3)

But is there any way to do this with (.+) ?

Edit3: Thank you for answers ... I will use this on my next problem :)

Edited by DoctorSLO
Link to comment
Share on other sites

You are experiencing "greediness". It is finding the largest possible match ending in '"'. Put (?U) at the start of your pattern string to invert greediness and find the smallest match:

StringRegExp($sourceHTML, '(?U)<a href="karte.php\?d=(.+)">', 3)

:)

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

yes, you want (.+) to match the smallest value so just add a question mark

<a href="karte.php\?d=(.+?)">

Mat

or do what PsaltyDS says and invert the greediness... They have exactly the same effects in this case.

I like yours better. The "(?U)" inverts all greediness quatifiers, while yours modifies only that one. It achieves the same thing in this example, but yours is a more surgical example for future reference.

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

I prefer to use ?'s as I don't make a mistake with other parts and me thinking i'm still in the default mode... And that happens pretty often. Its also easier to look at and see whats doing what. But there are cases where (?U) is very useful. One of my mates got it a bit wrong though and puts it in front of all his patterns, and puts a lot of ?'s in!

Just as a better explanation of greediness, as none has been given: When you say you want to match anything up to a character. the computer (being thick), does not think that maybe you don't want to match that character. In your example, it did exactly what you wanted, matched everything (.) up to the quote mark. This included quote marks themselves (2 of them). Greediness can either match the biggest selection it can, or the smallest. default is to match the longest, which is what your original was doing, when you really wanted the shortest. To invert it, place a "?" after the repeating character, or "(?U)" at the beginning.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...