Jump to content

Recommended Posts

  • Administrators
Posted

As buggy as David's implementation was, at least simple patterns I expect to work... do. Is PCRE really just retarded or am I missing something completely obvious?

I wish I knew enough expressions to comment. I'm pretty much limited to using the API based on the documentation and then relying on you guys and the test exe to see if it's working OK. But unless I compiled it incorrectly then it should be working as intended.
  • Replies 136
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

  • Administrators
Posted

This one as well.

http://www.lumadis.be/regex/test_regex.php?lang=en

Hmm, do we not need the old array[0] value?

re> /(Foo)*/g

data> FooFooFoo

0: FooFooFoo

1: Foo

0:

So the 0 value (which we are now throwing away) does indeed match the entire thing, and then the first captured sub pattern is the single Foo.

?

Posted

I found what I think might be an issue. Here is the code:

..

The output you got (1 match) is what i expect ... cos what matched was the entire pattern ( ie: including the "Uniques*" ).

Posted (edited)

A general remark about REs: it is easy to produce meaningless patterns that actually crash an RE engine, a sort of "while 1 ... wend" thing in RE syntax. What happens depends on the actual implementation.

Therefore, such an effect is not necessarily a bug in PCRE or in Jon's implementation of PCRE. It all depends on the pattern.

As to (.*?): this a pattern that matches 0 or or more of whatever (that's .*) but is not greedy (the ?). So it matches any string (say "test") first in position 0 and returns an empty string. Then it matches the "t", then the empty string between "t" and "e", then the empty string between "e" and "s" and so on. It returns 9 matches.

(.+?) returns exactly the four matches "t", "e", "s", "t", as one would expect.

I agree that REs can be hell but then again they are a completely logical hell :lmao:

EDIT:

I wish I knew enough expressions to comment. I'm pretty much limited to using the API based on the documentation and then relying on you guys and the test exe to see if it's working OK. But unless I compiled it incorrectly then it should be working as intended.

PCRE does work as intented. It is used in dozens of high-profile apps.

The fact that patterns don't do what people expect probably reflects more on their understanding of REs (or lack thereof) than actual errors in PCRE. (Note the "probably": this is not to say that PCRE has no bugs; it sure has. But if used correctly it tends to work correctly.)

One of the good things about PCRE (and Perl REs in general) is that they are well-documented, so it shouldn't be too difficult to get the hang of it. Much of what has been written in this thread is a classic case of RTFM.

As to the pattern (and results) Nutster's code accepted (and delivered), I would take these with a pinch of salt. They were definitely not Perl compatible.

Edited by thomasl
Posted (edited)

I have now downloaded the newest build and played a bit with it. My batch of patterns still work (though that's mostly ...Replace() stuff with backreferences etc.).

What doesn't work at all is StringRegExp(), flag=3, ie global match.

$s="test"
$b=StringRegExp($s,"(.*?)",3)
for $i=0 to ubound($B)-1
  ConsoleWrite("!"&$b[$i]&"!"&@CRLF);
next

This should return the nine strings as detailed in my other post, above. Simpler patterns like a lone . also don't work.

Edit: code

Edited by thomasl
  • Administrators
Posted

I have now downloaded the newest build and played a bit with it. My batch of patterns still work (though that's mostly ...Replace() stuff with backreferences etc.).

What doesn't work at all is StringRegExp(), flag=3, ie global match.

$s="test"
$b=StringRegExp($s,"(.*?)",3)
for $i=0 to ubound($B)-1
  ConsoleWrite("!"&$b[$i]&"!"&@CRLF);
next

This should return the nine strings as detailed in my other post, above. Simpler patterns like a lone . also don't work.

Edit: code

It's working here (9 strings). I'm about to upload a new build in 10 mins so try again with that.
  • Administrators
Posted

Ok, new build: http://www.autoitscript.com/autoit3/files/...utoIt3-pcre.exe

I added option 2 and 4.

Option 2, same as option 1 but it returns the full match as well in array[0] ( like preg_match() )

Option 4, same as option 3 but returns an array of arrays :lmao: Each sub array is like the single return value from option 2. This is like the php / preg_match_all() return value.

Examples:

;Option 2, single return, php/preg_match() style
$array = StringRegExp('<test>a</test> <test>b</test> <test>c</Test>', '<(?i)test>(.*?)</(?i)test>', 2)
for $i = 0 to UBound($array) - 1
    msgbox(0, "Option 2 - " & $i, $array[$i])
Next


;Option 3, global return, old AutoIt style
$array = StringRegExp('test', '(.*?)', 3)

for $i = 0 to UBound($array) - 1
    msgbox(0, "Option 3 - " & $i, $array[$i])
Next


;Option 4, global return, php/preg_match_all() style
$array = StringRegExp('<test>a</test> <test>b</test> <test>c</Test>', '<(?i)test>(.*?)</(?i)test>', 4)

for $i = 0 to UBound($array) - 1

    $match = $array[$i]
    for $j = 0 to UBound($match) - 1
        msgbox(0, "Option 4 - " & $i & ',' & $j, $match[$j])
    Next
Next
Posted (edited)

Thx. (The previous build did work after all..., after I got me flaming paths sorted :lmao: )

Now all is well. Well, almost...

StringRegExp("F1oF2oF3o","(F.o)*?",3) should give seven matches. AU3 gives only three, omitting the four empty matches (the other example -- "(.*? )" -- works):

--

AU3 :F1o

AU3 :F2o

AU3 :F3o

Perl:

Perl:F1o

Perl:

Perl:F2o

Perl:

Perl:F3o

Perl:

--

I will continue to throw REs at it.

EDIT:

Mode 4 hangs with StringRegExp("test","(.*?)",4)

Edited by thomasl
Posted (edited)

Fixed

:lmao:

Here's another thing to chew over.

$s="test"&@CRLF&"test"
ConsoleWrite($s&@CRLF)
$s=StringRegExpReplace($s,".","_")
ConsoleWrite($s&@CRLF)

This replaces everything with the exception of the LF (ie it also replaces the CR):

test

test

!!!!!

!!!!

Now this whole CR/LF handling is a thorny problem anyway. Perl REs have an option that switches between \n (which Perl assumes to be "\n" under *x and "\r\n\" under Win32) being treated like a string terminator (ie not matched by a .) or as just another character.

Your code seems to work under the assumption that \n is a terminator, not a normal character, which is fine for most matches and replaces (though at some point there should be an option to switch this off). But I am not sure about the semantics in terms of coding for AU3: if LF is not replaced, perhaps CR shouldn't either.

EDIT:

Here's more. StringRegExp("test"&@CRLF&"test",".",3) works as expected: nine matches (2*4 for the test's and 1 for the CR).

OTOH, StringRegExp("test"&@CRLF&"test","(.*?)",3) simply stops matching after the LF.

Edited by thomasl
Posted

The output you got (1 match) is what i expect ... cos what matched was the entire pattern ( ie: including the "Uniques*" ).

Please explain to me what's going on then because from what I understand about regular expressions, it should start matching on Unique and once that part of the pattern matches, it moves to the first Foo which also matches the pattern. Then because of the repitition operator, it should move to the next and final Foo in the string which still matches because we are repeatedly capturing Foo's.

What I was trying to do was find a unique position in a string which is then followed by one or more lines of data followed by an empty line. I wanted to capture the lines of data individually. An example of the string:

Unique
Data Line 1
Data Line 2
Note that the example is basically like my AutoIt code above.

Also, the "s*" should be "\s*". I don't know why but the forum stripped the escape sequence.

  • Administrators
Posted

Your code seems to work under the assumption that \n is a terminator, not a normal character, which is fine for most matches and replaces (though at some point there should be an option to switch this off). But I am not sure about the semantics in terms of coding for AU3: if LF is not replaced, perhaps CR shouldn't either.

Can't comment on the other stuff - right on the limit of my knowledge now - but I found an option in the pcrelib that is set at compile time that says you can specify a newline as \n or \r (a single char) it doesn't seem to have any options for \r\n. Our library was compiled with \n specified. It may be that when using CRLF sequences you have to strip them with StringStripCR() first to get expected results. Dunno.
Posted

Jon, I think that option sets what character(s) \n means. It can be either LF (Probably the default), CR or CRLF. I'm pretty sure I saw a flag in the documentation that sets it to CRLF, too. IMO, leaving \n to mean LF is fine because we can build a CRLF sequence with \r\n. However, it shouldn't affect \s, which is what I used above, because \s matches all whitespace characters and because of the repetition, it'll catch both CR and LF.

Posted

..

What I was trying to do was find a unique position in a string which is then followed by one or more lines of data followed by an empty line. I wanted to capture the lines of data individually. An example of the string:

Unique
Data Line 1
Data Line 2
Note that the example is basically like my AutoIt code above.

..

ok .. then this RE "Unique\s*(?:((Foo)\s*))*"

.. with PCRE calls via Thomasl's wrapper i get two "Foo"s

HTH

:)

Posted (edited)

ok .. then this RE "Unique\s*(? :( (Foo)\s*))*"

.. with PCRE calls via Thomasl's wrapper i get two "Foo"s

HTH

:)

Alright, that works. Now explain to me why. All you did was add another capture. How does that magically get it working?

Edit: And I can simplify that to this "Unique\s*((Foo)\s*)*" and it still works further adding to my confusion. If that simplified form works, why does the non-capturing form not work?

Edited by Valik

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...