Jump to content
FrancescoDiMuro

[Solved] StringRegExp zero-length match and no match at all

Recommended Posts

Good morning :)
I'm playing with SRE and trying to obtain some information from a test file.
I was testing the pattern on regex101, but when I bring it to AutoIt, it doesn't return the same result as on regex101.
I am surely (?:missing some important notes about PCRE engine|the pattern is not correct at all).

Script:

#include <Array.au3>
#include <StringConstants.au3>


Test()

Func Test()

    Local $strFileName = @ScriptDir & "\TestFile.txt", _
          $strFileContent, _
          $arrResult


    $strFileContent = FileRead($strFileName)
    If @error Then Return ConsoleWrite("FileRead ERR: " & @error & @CRLF)

    $arrResult = StringRegExp($strFileContent, '(?sx)User:\h([^\n]+)\n' & _
                                               'Login\-name:\h([^\n]+)\n' & _
                                               '(?:CaseSensitive:\h([^\n]+)\n)?' & _
                                               'NTSecurity:\h([^\n]+)\n' & _
                                               '(?:NO\n)?' & _
                                               '(?:Domain:\h([^\n]+)\n)?' & _
                                               'Timeout:\h([^\n]+)\n' & _
                                               '.*?' & _
                                               'Member:\h([^\n]+)\n', $STR_REGEXPARRAYGLOBALMATCH)

    If IsArray($arrResult) Then _ArrayDisplay($arrResult)

EndFunc

Test file:

User: AMMINISTRATORE
Login-name: ADM
CaseSensitive: YES
NTSecurity: NO
NO
Timeout: 00:05:00
Member: AMMINISTRATORI

User: Test_User
Login-name: Test_User
NTSecurity: YES
Domain: DNEU
Timeout: 00:00:00
Member: OPERATORS
Member: OPERATORS

Any help (even from cats) it's highly appreciated.

Cheers ^_^


Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

I suspect it is because of the line breaks.
You only allowed linebreak (\n) without carriage return (\r). 
If I make it more general it works for me:
 

#include <Array.au3>

Local $strFileName = @ScriptDir & "\Test.txt", _
        $strFileContent, _
        $aMatch

Local $sPattern = '(?sx)User:\h([^\n\r]+)\R' & _
        'Login\-name:\h([^\n\r]+)\R' & _
        '(?:CaseSensitive:\h([^\n\r]+)\R)?' & _
        'NTSecurity:\h([^\n\r]+)\R' & _
        '(?:NO\R)?' & _
        '(?:Domain:\h([^\n\r]+)\R)?' & _
        'Timeout:\h([^\n\r]+)\R' & _
        '.*?' & _
        'Member:\h([^\n\r]+)\R'

$strFileContent = FileRead($strFileName)
If @error Then ConsoleWrite("FileRead ERR: " & @error & @CRLF)

For $aMatch In StringRegExp($strFileContent, $sPattern, 4)
    _ArrayDisplay($aMatch)
Next

 

Share this post


Link to post
Share on other sites
Posted (edited)

Capture groups are simply numbered in the pattern from left to right.
If a group does not match during matching, it remains empty.

The internal function pcre_exec knows a parameter "PCRE_NOTEMPTY" for this. This parameter causes that empty groups are not returned. Unfortunately, this parameter cannot be set via the StringRegExp function.

For your case the behavior is even good, because you have the respective values always at the same index no matter whether optional attributes appear in between or not.

You can also go through the array yourself afterwards and delete null string elements.

Here, however, I would rather write some kind of small two-stage parser:

  1. First separate the single objects from each other (separated by empty lines?).
  2. Then extract the attribute-name:attribute-value combinations for each array and put them into a dictionary for every object.

This way you have speaking names afterwards and it is easier to work with them.

Edit: Anyway - here's an example of what I mean:
 

#include <Array.au3>

Local $strFileName = @ScriptDir & "\Test.txt"
Local $strFileContent = FileRead($strFileName)

#Region parsing
Local $aGroups = StringRegExp($strFileContent, '(?s)(User:.+?(?>\R\R|\Z))', 3)
Local $aObjects[UBound($aGroups)], $i = 0

For $aObject In $aGroups
    Local $oTmp = ObjCreate("Scripting.Dictionary")
    For $aAttribute in StringRegExp($aObject, '(?m)^(.+?):\h*(.+)', 4)
        $oTmp($aAttribute[1]) = $aAttribute[2]
    Next
    $aObjects[$i] = $oTmp
    $i += 1
Next
#EndRegion parsing

; get second object:
$oObj2 = $aObjects[1]

; ask if Attribute CaseSensitive exists:
If $oObj2.Exists("CaseSensitive") Then MsgBox(0,"", "CaseSensitive exists")

; ask for attribute value of Domain:
MsgBox(0,"Domain", $oObj2("Domain"))

 

Edited by AspirinJunkie

Share this post


Link to post
Share on other sites
9 hours ago, AspirinJunkie said:

Unfortunately, this parameter cannot be set via the StringRegExp function

There is a workaround using non-capturing group with reset   (?|...)
Raw example

#include <Array.au3>

$strFileContent = FileRead(@ScriptDir & "\Test.txt")
$s = "User|Login-name|CaseSensitive|NTSecurity|NO|Domain|Timeout|Member"
$arrResult = StringRegExp($strFileContent, '((?|' & $s & ')):\h(.*)', 3)
Local $n = UBound($arrResult), $k = 2, $res2D[Ceiling($n/$k)][$k]
For $i = 0 To $n - 1
    $res2D[Int($i / $k)][Mod($i, $k)] = $arrResult[$i]
Next
_ArrayDisplay($res2D)

 

Share this post


Link to post
Share on other sites
Posted (edited)

The workaround results in no more empty matches appearing in the result - yes.
But it misses the basic problem.
The basic problem is that if groups are defined but not matched then they remain as empty matches in the result set.
This is especially significant when the order of the matches is important.

Your workaround ignores the order of the elements (as in my example with the dictionary) and is therefore not a general workaround for the problem of zero length matches. Whether it is a solution for Francesco depends on whether the order of the elements is essential for him or not. In his pattern he defined a order.
 

10 hours ago, mikell said:

There is a workaround using non-capturing group with reset   (?|...)

Why branch reset? Here it is congruent with:

StringRegExp($strFileContent, '(' & $s & '):\h(.*)', 3)


 

Edited by AspirinJunkie

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By jmp
      i am trying to get number from string using this code :
      #include <IE.au3> $oIE = _IEAttach ("Edu.corner") Local $aName = "Student name & Code:", $iaName = "0" Local $oTds = _IETagNameGetCollection($oIE, "td") For $oTd In $oTds If $oTd.InnerText = $aName Then $iaName = $oTd.NextElementSibling.InnerText $iGet = StringRegExpReplace($iaName, "\D", "") EndIf Next MsgBox(0, "", $iGet) it was get number like 52503058
      But, I want to get only student code 5250. (Different student have different code, sometime its 3 digits, Sometime 4)

       
    • By BlueBandana
      Is there a way to output the regex matches into a file?
      I have a script to compare two files and check for regex matches.
      I want to output the matching regex of 'testexample.txt' to another file.
      #include <MsgBoxConstants.au3> #include <Array.au3> $Read = FileReadToArray("C:\Users\admin\Documents\testexample.txt") $Dictionary = FileReadToArray("C:\Users\admin\Documents\example.txt") For $p = 0 To UBound($Dictionary) - 1 Step 1 $pattern = $Dictionary[$p] For $i = 0 To UBound($Read) - 1 Step 1 $regex = $Read[$i] If StringRegExp($regex, $pattern, 0) Then MsgBox(0, "ResultsPass", "The string is in the file, highlighted strings: " ) Else MsgBox(0, "ResultsFail", "The string isn't in the file.") EndIf Next Next  
    • By therks
      So I have this pattern: 
      ^(?:(\d+)|(\d+):(\d+)|(\d+):(\d+):(\d+))$ And I'm expecting (depending on input) to get a 1, 2 or 3 index array (or @error for invalid input).
      But instead I get this:
      #include <Debug.au3> Func Test($String) _DebugArrayDisplay(StringRegExp($String, '^(?:(\d+)|(\d+):(\d+)|(\d+):(\d+):(\d+))$', 1)) EndFunc Test('10') ; Results (normal, expected): ; Row 0|10 Test('10:20') ; Results (extra blank index): ; Row 0| ; Row 1|10 ; Row 2|20 Test('10:20:30') ; Results (three blank indices): ; Row 0| ; Row 1| ; Row 2| ; Row 3|10 ; Row 4|20 ; Row 5|30 Is this normal? Should I just code around it, or is there a better way to do what I'm looking for?
      I also tried reversing my regex, but it was even uglier results:
      #include <Debug.au3> Func Test($String) _DebugArrayDisplay(StringRegExp($String, '^(?:(\d+):(\d+):(\d+))|(\d+):(\d+)|(\d+)$', 1)) EndFunc Test('10') ; Results (yuck): ; Row 0| ; Row 1| ; Row 2| ; Row 3| ; Row 4| ; Row 5|10 Test('10:20') ; Results (slightly better): ; Row 0| ; Row 1| ; Row 2| ; Row 3|10 ; Row 4|20 Test('10:20:30') ; Results (nice): ; Row 0|10 ; Row 1|20 ; Row 2|30  
    • By Deye
      Hi,
      I want to add any needed conditions to the StringRegExp command so it can pull out only  "File.au3", "WinAPIFiles.au3", "Test.bmp" into the array
      #include <FileConstants.au3> #include <MsgBoxConstants.au3> #include 'WinAPIFiles.au3' #include "File.au3" ; Script Start - Add your code below here Local $bFileInstall = False ; Change to True and ammend the file paths accordingly. ; This will install the file C:\Test.bmp to the script location. If $bFileInstall Then FileInstall("C:\Test.bmp", @ScriptDir & "\Test.bmp") $sFile = FileRead(@ScriptFullPath) $aResults = StringRegExp($sFile, "(?i)(FileInstall\s*|include\s*)(.*)", 3) _ArrayDisplay($aResults) Thanks In Advance
      Deye
    • By FroVN
      i have a text : <Name>Jonh</Name>.<Age>15</Age>
      how i can get Jonh and 15 in one stringregexp? pls give me example
×
×
  • Create New...