Jump to content

[Solved] StringRegExp zero-length match and no match at all


Recommended Posts

Good morning :)
I'm playing with SRE and trying to obtain some information from a test file.
I was testing the pattern on regex101, but when I bring it to AutoIt, it doesn't return the same result as on regex101.
I am surely (?:missing some important notes about PCRE engine|the pattern is not correct at all).

Script:

#include <Array.au3>
#include <StringConstants.au3>


Test()

Func Test()

    Local $strFileName = @ScriptDir & "\TestFile.txt", _
          $strFileContent, _
          $arrResult


    $strFileContent = FileRead($strFileName)
    If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF)

    $arrResult = StringRegExp($strFileContent, '(?sx)User:\h([^\n]+)\n' & _
                                               'Login\-name:\h([^\n]+)\n' & _
                                               '(?:CaseSensitive:\h([^\n]+)\n)?' & _
                                               'NTSecurity:\h([^\n]+)\n' & _
                                               '(?:NO\n)?' & _
                                               '(?:Domain:\h([^\n]+)\n)?' & _
                                               'Timeout:\h([^\n]+)\n' & _
                                               '.*?' & _
                                               'Member:\h([^\n]+)\n', $STR_REGEXPARRAYGLOBALMATCH)

    If IsArray($arrResult) Then _ArrayDisplay($arrResult)

EndFunc

Test file:

User: AMMINISTRATORE
Login-name: ADM
CaseSensitive: YES
NTSecurity: NO
NO
Timeout: 00:05:00
Member: AMMINISTRATORI

User: Test_User
Login-name: Test_User
NTSecurity: YES
Domain: DNEU
Timeout: 00:00:00
Member: OPERATORS
Member: OPERATORS

Any help (even from cats) it's highly appreciated.

Cheers ^_^

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

I suspect it is because of the line breaks.
You only allowed linebreak (\n) without carriage return (\r). 
If I make it more general it works for me:
 

#include <Array.au3>

Local $strFileName = @ScriptDir & "\Test.txt", _
        $strFileContent, _
        $aMatch

Local $sPattern = '(?sx)User:\h([^\n\r]+)\R' & _
        'Login\-name:\h([^\n\r]+)\R' & _
        '(?:CaseSensitive:\h([^\n\r]+)\R)?' & _
        'NTSecurity:\h([^\n\r]+)\R' & _
        '(?:NO\R)?' & _
        '(?:Domain:\h([^\n\r]+)\R)?' & _
        'Timeout:\h([^\n\r]+)\R' & _
        '.*?' & _
        'Member:\h([^\n\r]+)\R'

$strFileContent = FileRead($strFileName)
If @error Then ConsoleWrite("FileRead ERR: " & @error & @CRLF)

For $aMatch In StringRegExp($strFileContent, $sPattern, 4)
    _ArrayDisplay($aMatch)
Next

 

Link to comment
Share on other sites

Capture groups are simply numbered in the pattern from left to right.
If a group does not match during matching, it remains empty.

The internal function pcre_exec knows a parameter "PCRE_NOTEMPTY" for this. This parameter causes that empty groups are not returned. Unfortunately, this parameter cannot be set via the StringRegExp function.

For your case the behavior is even good, because you have the respective values always at the same index no matter whether optional attributes appear in between or not.

You can also go through the array yourself afterwards and delete null string elements.

Here, however, I would rather write some kind of small two-stage parser:

  1. First separate the single objects from each other (separated by empty lines?).
  2. Then extract the attribute-name:attribute-value combinations for each array and put them into a dictionary for every object.

This way you have speaking names afterwards and it is easier to work with them.

Edit: Anyway - here's an example of what I mean:
 

#include <Array.au3>

Local $strFileName = @ScriptDir & "\Test.txt"
Local $strFileContent = FileRead($strFileName)

#Region parsing
Local $aGroups = StringRegExp($strFileContent, '(?s)(User:.+?(?>\R\R|\Z))', 3)
Local $aObjects[UBound($aGroups)], $i = 0

For $aObject In $aGroups
    Local $oTmp = ObjCreate("Scripting.Dictionary")
    For $aAttribute in StringRegExp($aObject, '(?m)^(.+?):\h*(.+)', 4)
        $oTmp($aAttribute[1]) = $aAttribute[2]
    Next
    $aObjects[$i] = $oTmp
    $i += 1
Next
#EndRegion parsing

; get second object:
$oObj2 = $aObjects[1]

; ask if Attribute CaseSensitive exists:
If $oObj2.Exists("CaseSensitive") Then MsgBox(0,"", "CaseSensitive exists")

; ask for attribute value of Domain:
MsgBox(0,"Domain", $oObj2("Domain"))

 

Edited by AspirinJunkie
Link to comment
Share on other sites

9 hours ago, AspirinJunkie said:

Unfortunately, this parameter cannot be set via the StringRegExp function

There is a workaround using non-capturing group with reset   (?|...)
Raw example

#include <Array.au3>

$strFileContent = FileRead(@ScriptDir & "\Test.txt")
$s = "User|Login-name|CaseSensitive|NTSecurity|NO|Domain|Timeout|Member"
$arrResult = StringRegExp($strFileContent, '((?|' & $s & ')):\h(.*)', 3)
Local $n = UBound($arrResult), $k = 2, $res2D[Ceiling($n/$k)][$k]
For $i = 0 To $n - 1
    $res2D[Int($i / $k)][Mod($i, $k)] = $arrResult[$i]
Next
_ArrayDisplay($res2D)

 

Link to comment
Share on other sites

The workaround results in no more empty matches appearing in the result - yes.
But it misses the basic problem.
The basic problem is that if groups are defined but not matched then they remain as empty matches in the result set.
This is especially significant when the order of the matches is important.

Your workaround ignores the order of the elements (as in my example with the dictionary) and is therefore not a general workaround for the problem of zero length matches. Whether it is a solution for Francesco depends on whether the order of the elements is essential for him or not. In his pattern he defined a order.
 

10 hours ago, mikell said:

There is a workaround using non-capturing group with reset   (?|...)

Why branch reset? Here it is congruent with:

StringRegExp($strFileContent, '(' & $s & '):\h(.*)', 3)


 

Edited by AspirinJunkie
Link to comment
Share on other sites

  • FrancescoDiMuro changed the title to [Solved] StringRegExp zero-length match and no match at all

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...