Imbuter2000

regular expression misteriously doesn't match (!?)

18 posts in this topic

I'm really going mad for a regular expression that doesn't work if I add or just invert the alternation.

Try this script:

$screen = "" _
& "  AFASN0M         A N A G R A F E  INTERROGAZIONE SINTETICA  -  ABC   22/04/16  " _
& @CRLF & "=============================================================================== " _
& @CRLF & "Ndg: 12345678         Tipo: 10100  P.F.  -  EFF/COMPL.           Sorvegl.       " _
& @CRLF & "GHILARDI ABCDEF                                                                " _
& @CRLF & "VIA IV OTTOBRE 12               c/o                                            " _
& @CRLF & "12345 PONTENUCOLA                      BG       Cittad:  ITALIANA               " _
& @CRLF & "Nato il: 12/12/1912  A: MARISOLE                         Prov.: BG   Sesso: M   " _
& @CRLF & "C.F. ABCDEFGHM14I858O   P.I.               Tel.                                 " _
& @CRLF & "SAE/RAE    200/      Data Visura: 12/12/1912                    Prof.  000007   " _
& @CRLF & "Cod. C.R.    1234567890  Data: 01/2000 Cod. C.R.A.                Data:         " _
& @CRLF & "Segmento cliente:  12  MASS                                                     " _
& @CRLF & "Informazioni da altre societa' del gruppo                                       "


local $regex_one="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _
&  "\r\n"

local $regex_two="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +"


$regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")"  ; it DOESN'T MATCH :(
$regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :)
$regex_test_simple = $regex_two  ; it MATCHES :)

If StringRegexp($screen,$regex_test_one) = 1 Then
    Msgbox(0,"","match")
Else
    Msgbox(0,"","NOT match")
EndIf

What the hell is the reason why $regex_test_one doesn't match while $regex_test_two and $regex_test_simple match, when they should all match????!

Share this post


Link to post
Share on other sites



Your code doesn't behave like you seem to believe it does.
A more verbose version (subject and expressions unchanged):

#include <Array.au3>

$screen = "" _
& "  AFASN0M         A N A G R A F E  INTERROGAZIONE SINTETICA  -  ABC   22/04/16  " _
& @CRLF & "=============================================================================== " _
& @CRLF & "Ndg: 12345678         Tipo: 10100  P.F.  -  EFF/COMPL.           Sorvegl.       " _
& @CRLF & "GHILARDI ABCDEF                                                                " _
& @CRLF & "VIA IV OTTOBRE 12               c/o                                            " _
& @CRLF & "12345 PONTENUCOLA                      BG       Cittad:  ITALIANA               " _
& @CRLF & "Nato il: 12/12/1912  A: MARISOLE                         Prov.: BG   Sesso: M   " _
& @CRLF & "C.F. ABCDEFGHM14I858O   P.I.               Tel.                                 " _
& @CRLF & "SAE/RAE    200/      Data Visura: 12/12/1912                    Prof.  000007   " _
& @CRLF & "Cod. C.R.    1234567890  Data: 01/2000 Cod. C.R.A.                Data:         " _
& @CRLF & "Segmento cliente:  12  MASS                                                     " _
& @CRLF & "Informazioni da altre societa' del gruppo                                       "


local $regex_one="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _
&  "\r\n"

local $regex_two="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +"


$regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")"  ; it DOESN'T MATCH :(
$regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :)
$regex_test_simple = $regex_two  ; it MATCHES :)

Local $a = StringRegexp($screen,$regex_test_one, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a)
EndIf

$a = StringRegexp($screen,$regex_test_two, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a)
EndIf

$a = StringRegexp($screen,$regex_test_simple, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a)
EndIf

 


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Jchd your code has the same wrong results of mine: as far as I can see it doesn match with $regex_test_one while it matches with $regex_test_two and $regex_test_simple

It's like saying that the regex "A|B"  doesn't match over the text "B" (while the regex "B|A" or just "B" correctly match).

Do you understand why in my example the regex "A|B" doesn't match the text "B"?

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

$regex_test_one = "(?s:"&$regex_one&")|("&$regex_two&")"  ; it MATCHES :)
$regex_test_two = "(?s:"&$regex_two&")|("&$regex_one&")" ; it MATCHES :)

or maybe

$regex_test_one = "(?s:)("&$regex_one&")|("&$regex_two&")"  ; it MATCHES :)
$regex_test_two = "(?s:)("&$regex_two&")|("&$regex_one&")" ; it MATCHES :)

Don't ask me - what's the difference? :unsure:

Edited by czardas

Share this post


Link to post
Share on other sites

I think that it's a bug of the regular expression engine or implementation, what do you think?

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

I think regular expressions must have been invented by aliens. I would expect the same result as you, but I don't know if it's inconsistent implementation or my misconception. I guess someone will be able to fathom it out and hopefully cast some light on it. I tend to separate groups as above, which seems to do the trick in certain cases.

Edited by czardas

Share this post


Link to post
Share on other sites

It's probably because the regex one is wrong

#include <Array.au3>

$screen = "" _
& "  AFASN0M         A N A G R A F E  INTERROGAZIONE SINTETICA  -  ABC   22/04/16  " _
& @CRLF & "=============================================================================== " _
& @CRLF & "Ndg: 12345678         Tipo: 10100  P.F.  -  EFF/COMPL.           Sorvegl.       " _
& @CRLF & "GHILARDI ABCDEF                                                                " _
& @CRLF & "VIA IV OTTOBRE 12               c/o                                            " _
& @CRLF & "12345 PONTENUCOLA                      BG       Cittad:  ITALIANA               " _
& @CRLF & "Nato il: 12/12/1912  A: MARISOLE                         Prov.: BG   Sesso: M   " _
& @CRLF & "C.F. ABCDEFGHM14I858O   P.I.               Tel.                                 " _
& @CRLF & "SAE/RAE    200/      Data Visura: 12/12/1912                    Prof.  000007   " _
& @CRLF & "Cod. C.R.    1234567890  Data: 01/2000 Cod. C.R.A.                Data:         " _
& @CRLF & "Segmento cliente:  12  MASS                                                     " _
& @CRLF & "Informazioni da altre societa' del gruppo                                       "


local $regex_one="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _
&  "\r\n"

local $regex_two="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +"


$regex_test_one = "(?s:"&$regex_one&"|"&$regex_two&")"  ; it DOESN'T MATCH :(
$regex_test_two = "(?s:"&$regex_two&"|"&$regex_one&")" ; it MATCHES :)
$regex_test_simple1 = $regex_one  ; 
$regex_test_simple2 = $regex_two  ; it MATCHES :)

Local $a = StringRegexp($screen,$regex_test_one, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a, "$regex_test_one")
EndIf

$a = StringRegexp($screen,$regex_test_two, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a, "$regex_test_two")
EndIf

$a = StringRegexp($screen,$regex_test_simple1, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a, "$regex_test_simple1")
EndIf

$a = StringRegexp($screen,$regex_test_simple2, 1)
If @error Then
    Msgbox(0,"Error", @error)
Else
    _ArrayDisplay($a, "$regex_test_simple2")
EndIf

 

Share this post


Link to post
Share on other sites

Is this a joke guys? Here I just added a title to ArrayDisplays:

2016-04-22_165409.png.fa5cc5d1651284d023

2016-04-22_165416.png.4e3515c3d883349945

2016-04-22_165421.png.55863493adf4bf191f


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@Imbuter2000,

If you meant that regex_one (note: not regex_test_one) alone doesn't match, this is no surprise since the final \r\n in the expression can never match. Hence this part of the alternation is pointless. Your regexps are terrible and imply a whole lot of useless backtracking.

Besides, it is unclear what you actually intend to capture, how the various fields are formatted and whether they are mandatory or not.

1 person likes this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
9 hours ago, jchd said:

Your regexps are terrible

I vote this to replace 'resistance is obligatory'

1 person likes this

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Guys, the regexps that I wrote and seem terrible to you were simplifyied by me to the minimum necessary to display the bug.

The original regexps that I'm using are the following and they're used in alternation because yes, only one at a time matches, in the $screen of the example only $regex_two matches, and further in my script I have an "if" that makes different things depending on what of the two matches.

local $regex_one=" .* A N A G R A F E  INTERROGAZIONE SINTETICA.* " _
            &  "\r\n=============================================================================== " _
            &  "\r\nNdg: " & $ndg & " +Tipo: \d{5} +.* Sorvegl\. .*" _
            &  "\r\n(.*?) +" _
            &  "\r\n(.*?) +c/o .*" _
            &  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _
            &  "\r\n"

            local $regex_two=" .* A N A G R A F E  INTERROGAZIONE SINTETICA.* " _
            &  "\r\n=============================================================================== " _
            &  "\r\nNdg: " & $ndg & " +Tipo: \d{5} +(P\.F\.  -|COINT -) .* Sorvegl\. .*" _
            &  "\r\n(.*?) +" _
            &  "\r\n(.*?) +c/o .*" _
            &  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +Cittad: +([^ ].*?) +" _
            &  "\r\nNato il: (\d\d/\d\d/\d\d\d\d)  A: ([^ ].*?) +Prov\.: (\w\w) +Sesso: (\w) *" _
            &  "\r\nC\.F\. +(\w+|) +P\.I\. +Tel\..*" _
            &  "\r\n(?s:.*)" _
            &  "\r\nCODTX: ____ .*" _
            &  "\r\n"

So, is there a valid reason why the order in the following alternations makes the difference between not matching and matching over $screen?

this doesn't match:  
"(?s:"&$regex_one&"|"&$regex_two&")"
this matches:             "(?s:"&$regex_two&"|"&$regex_one&")"
Why???

I was also always confident that there were no need to add the parenthesis like czardas suggested.  I learned to not use parenthesis when they're not needed.  Should I start using them again around every elements in every alternations to be sure to bypass this bug/weird limitation of regexes in AutoIT?

P.S.: I use regular expressions like this for 20 years for reading and parsing AS/400 terminal screens, if you think that my regexps are terrible can you please explain what's the problem in these?

 

Share this post


Link to post
Share on other sites

@Imbuter2000,

You still didn't answer post #8, nor other subsequent questions. Making the thread a moving target doesn't simplify things.

The reason why the following (at this time):

local $regex_one="\r\n(.*?) +" _
&  "\r\n(.*?) +c/o .*" _
&  "\r\n {0,4}(\d{2,5}) ([^ ].*?) +(\w\w) +" _
&  "\r\n"

doesn't match is due to the extra "\r\n" at the tail of the regexp, something that you could have checked by yourself.

Also, option dotall (?s) makes dot (.) match \r\n, \r and \n since the implementation uses (*ANYCRLF) internally. Using the unbounded sequence .* implies a lot of backtracking. Use site http://regex101.com/ to see how things work.

From post #8+ the question is still pending: which fields do you want to capture and are some fields optional? Since you now tell us that the subjects come from screen captures, I believe the subjects' format is fixed and extracting fields with StringMid() would probably be much simpler and more reliable.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

jhcd, I didn't unswer to post #8 because the second image is broken (not visible) and I didn't understand your points.

If you read again my first message, the question is not why $regex_one doesn't match.  It's correct, it doesn't have to match in this example.
The question, as you can read in my first post, is why $regex_test_one - that includes an alternation or both the not matching and matching versions - doesn't match!

P.S.:  StringMid to validate a text screen is impossible and to parse multiple elements in a whole text screen would require multiple StringMids and minutes for counting the position and lenght of any field.   With a single regex built in less time I validate and parse the multiple elements.
Validation is necessary as many pages could show unexpected fields and rows. The two regexes in alternation are there because of this, to work well in both the situations, where StringMid would blindly fail.
 

Share this post


Link to post
Share on other sites

AFAICT he three window captures in post #8 are visible here wih both FireFox and Chrome. I don't know what I can do about that.

Run the script by yourself and see that your statement  $regex_test_one doesn't match! is wrong..


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I just copied and pasted the code of my first message in my AutoIT, pressed F5 and I obtained a msgbox with the text "NOT match".
Do you obtain "match" instead?  (!?)

Share this post


Link to post
Share on other sites

Definitely:

2016-04-26_162747.thumb.png.3c33583c6c18


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

oh God, look mine

screenshot.png

Share this post


Link to post
Share on other sites

Use a different way to post your screenshot, this one isn't visible.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now