Jump to content

RegEx gurus


Recommended Posts

Please consider the following code :

;RegExp Error ?
$str = "Error"
$ptrn = "(A.*?or)|(E.*?or)"

$a = StringRegExp($str,$ptrn,3)
$out = ""
If @error = 0 Then
    For $i = 0 to Ubound($a) - 1
        $out &= $i & " => " & $a[$i] & @crlf
    Next
    Msgbox(0,"Found these matches", $out)
Else
    MsgBox(0,"","Shouldn't get here")
EndIf

My pattern is looking for a word starting with "E" OR "A" and ending with "or"

I am expecting just 1 match

The first part of the OR statement should fail, and the second part should give me a match.

But in fact I get get 2 matches.

0 =>

1 => Error

Anyone any thoughts ?

Steve

Link to comment
Share on other sites

Please consider the following code :

;RegExp Error ?
$str = "Error"
$ptrn = "(A.*?or)|(E.*?or)"

$a = StringRegExp($str,$ptrn,3)
$out = ""
If @error = 0 Then
    For $i = 0 to Ubound($a) - 1
        $out &= $i & " => " & $a[$i] & @crlf
    Next
    Msgbox(0,"Found these matches", $out)
Else
    MsgBox(0,"","Shouldn't get here")
EndIf

My pattern is looking for a word starting with "E" OR "A" and ending with "or"

I am expecting just 1 match

The first part of the OR statement should fail, and the second part should give me a match.

But in fact I get get 2 matches.

0 =>

1 => Error

Anyone any thoughts ?

Steve

Using this pattern appears to work in all cases. Edit: Except lower case.

$ptrn = "((?:A|E).*?or)"

2nd Edit:

Use this next pattern for case insensitive matching.

$ptrn = "(?i)((?:A|E)[^\h]*?or)" ; Case insensitive
Edited by Malkey
Link to comment
Share on other sites

...

My pattern is looking for a word starting with "E" OR "A" and ending with "or"

...

First, your expression is looking for any A or any E somewhere that are followed by "or" on the same line.

The empty match is because even though the first parentheses weren't (correct?) participate in the match they will be captured to the first index (or the index where they are). So it's sufficient to match to capture.

Another alternative to Malkey's one is:

([AE].*?or)

Or using word boundaries:

\b([AE].*?or)\b

Hope it's clear.

Edit: Just to make it more accurate. Assume you got this pattern:

(\w)(\d)

If you'll read about back-reference you'll come to understand:

(\w)(\d)(?(1)\w)(?(2)\d)

...so if the first matches the second capture nothing, but it's obvious. If the second matches then you'd want to refer to the second conditional sub-pattern to match another digit using it's second grouping parentheses and not guess which index will it be in case of not capturing the first. So if you have 9 grouping parentheses you'll definitely want all of them to capture even if only one contains value because of this back-reference mechanism.

Edited by Authenticity
Link to comment
Share on other sites

Authenticity, Malkey - thanks for your detailed replies.

I realised later the example I gave did not fully represent the issue I am having .. :)

A better example would be this:

;RegExp Error ?
$str = "Error Success"
$ptrn = "(Suc.*?cess)|(E.*?or)"

$a = StringRegExp($str,$ptrn,3)
$out = ""
If @error = 0 Then
    For $i = 0 to Ubound($a) - 1
        $out &= $i & " => " & $a[$i] & @crlf
    Next
    Msgbox(0,"Found these matches", $out)
Else
    MsgBox(0,"","Shouldn't get here")
EndIf

Result here is :

0 =>

1 => Error

2 => Success

I understand (I think) why the regular expression engine needs to know what the previous match is (for back referencing purposes), but back referencing is not used here and the results from the test should not include failed matches.

I have passed this pattern through online PCRE engines and RegEx buddy and they all give the expected results.

Just to let you know, I have tried to create an example here to show the issue. In practice I am examining inputs and output to machine tools. The results are not words, but sequences or characters and white space. The indexing is important because I query the tools for a certain number of values and I am looking for a certain number of replies - for AutoIt to return the value for a failed OR test is not helpful as the indexing of test vs reply is important.

I am really quite keen to get a solution - I have got a few million lines of log files to examine for a particular issue .. :):P

Thanks in advance for any assistance you can give me.

Link to comment
Share on other sites

Authenticity, Malkey - thanks for your detailed replies.

I realised later the example I gave did not fully represent the issue I am having .. :)

A better example would be this:

;RegExp Error ?
$str = "Error Success"
$ptrn = "(Suc.*?cess)|(E.*?or)"

$a = StringRegExp($str,$ptrn,3)
$out = ""
If @error = 0 Then
    For $i = 0 to Ubound($a) - 1
        $out &= $i & " => " & $a[$i] & @crlf
    Next
    Msgbox(0,"Found these matches", $out)
Else
    MsgBox(0,"","Shouldn't get here")
EndIf

Result here is :

0 =>

1 => Error

2 => Success

I understand (I think) why the regular expression engine needs to know what the previous match is (for back referencing purposes), but back referencing is not used here and the results from the test should not include failed matches.

I have passed this pattern through online PCRE engines and RegEx buddy and they all give the expected results.

Just to let you know, I have tried to create an example here to show the issue. In practice I am examining inputs and output to machine tools. The results are not words, but sequences or characters and white space. The indexing is important because I query the tools for a certain number of values and I am looking for a certain number of replies - for AutoIt to return the value for a failed OR test is not helpful as the indexing of test vs reply is important.

I am really quite keen to get a solution - I have got a few million lines of log files to examine for a particular issue .. :P:idea:

Thanks in advance for any assistance you can give me.

The result you got matches what I see on on-line PCRE tests:

For example, here:

Matches (Index followed by matched text):

1. Full pattern match:

Error

at offset 0

Sub-string #1:

Sub-string #2:

Error

2. Full pattern match:

Success

at offset 6

Sub-string #1:

Success

Here is what's happening: Because you used an "or" expression (the pipe symbol) it lists the results of each subgroup when either "or" expression matches, up to the one that matches.

When it reads "Error", that matches the second of the two "or" expressions and it then returns the results from the two subgroups, which is "[uNDEFINED]" for (Suc.*?cess), and "Error" for (E.*?or). That's why you get the null report in [0] of the returned array.

This doesn't happen when it gets to "Success" because it only lists sub-groups up to the one that matched, and this time it matched on the first sub-group.

To see it even more use:

$ptrn = "(NeverMatch)|(Suc.*?cess)|(E.*?or)"

And then you get:

0 = ""
1 = ""
2 = Error
3 = ""
4 = Success

To fix it, make the sub-groups non-capturing and return only the global result of all sub-groups together:

$ptrn = "((?:Suc.*?cess)|(?:E.*?or))"

:)

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

PSaltyDS,

Thank you for this help. The explanation you gave (and the solution) is brilliant.

Thanks again to Authenticity and Malkey for looking at this.

Hopefully the explanations you have given will be as helpful to others as they have been for me.

Steve

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...