Jump to content

Regex: get only nth match not all


jugador
 Share

Recommended Posts

sample code

#include <Array.au3>
Local $sRegex = '(?<={"data":)[^\]]+'
Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'

Local $aArray = StringRegExp($sString, $sRegex, 3)
_ArrayDisplay($aArray)

Now to get 1st or 2nd or 3rd match I can write $aArray[0] / $aArray[1] / $aArray[2] 
but how I do it using regex?

for example:
> stop Regular expression after 1st match or 
> capture only last match

Link to comment
Share on other sites

I fully agree with @AspirinJunkie, if you're going to be using JSON data, use a JSON parser, such as AspirinJunkie's own: 

However, if you insist on not using it and limiting yourself, I would take a 2 step approach to this example, using _StringBetween() first (assuming you only care about the data in the nested arrays):

#include <Array.au3>
#include <String.au3>
#include <StringConstants.au3>

Local $sRegex = '"([^"]*)":"([^"]*)'
Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'

Local $aStringSplit = _StringBetween($sString, '[{', '}]')
_ArrayDisplay($aStringSplit)

Local $sArrayName = ''

For $iMatch = 0 To UBound($aStringSplit) - 1
    Local $aArray = StringRegExp($aStringSplit[$iMatch], $sRegex, $STR_REGEXPARRAYMATCH)
    If UBound($aArray) = 1 Then ContinueLoop
    
    $sArrayName = _StringBetween($aStringSplit[$iMatch], '"', '":')
    If Not @error Then $sArrayName = $sArrayName[0]
    
    ConsoleWrite( _
            'Array: ' & $sArrayName & _
            ', Key: ' & $aArray[0] & _
            ', Value: ' & $aArray[1] & @CRLF)
    _ArrayDisplay($aArray, $sArrayName)
Next

Using this approach you can also get the named indexes (records, filtered, duplicate), as you can see on the _ArrayDisplay titles and with _StringBetween($aStringSplit[$iMatch], '"', '":')

As for your questions of "stop Regular expression after 1st match", I don't know if that's possible in AutoIt, and I'm not sure why you'd want that. Performance? If so, you could probably get away with doing a simpler check using StringInStr()

For "capture only last match", just use the StringRegExp and in the resulting array do:

$aArray[UBound($aArray) - 1]

 

We ought not to misbehave, but we should look as though we could.

Link to comment
Share on other sites

@jugador it's doable with RegEx, if you use Mode 1 (and not 3)

#include <Array.au3>

Local $aArray, $iOffset = 1, $iMatch = 3 ; <=== change $iMatch value here
Local $sRegex = '(?<={"data":)[^\]]+'
Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'

For $i = 1 To 999999999
    $aArray = StringRegExp($sString, $sRegex, 1, $iOffset) ; Mode 1 = $STR_REGEXPARRAYMATCH    
    If @error Then Exit MsgBox(0, "Match " & $iMatch, "Nothing found")
    If $i = $iMatch Then Exit _ArrayDisplay($aArray, "Match " & $iMatch)        
    $iOffset = @extended ; not the best place at all for this line (but ok with this script)
Next

 

Edited by pixelsearch
Link to comment
Share on other sites

Since there were a regex-based solutions here, just for completeness: Of course, it also works purely on a regex-pattern basis:
 

Local $aMatch
Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'
$sString &= $sString & $sString

; first match:
$aMatch = StringRegExp($sString, '^.*?"data":\h*"\K[^"]+', 1)   ; should also work without "^.*?" because $STR_REGEXPARRAYMATCH is used
ConsoleWrite("1st:  " & $aMatch[0] & @CRLF)

; nth matches (not including first):
For $i = 1 To 20 ; if no xth match then don't output
    $aMatch = StringRegExp($sString, '(?:"data":.*?){' & $i & '}"data":\h*"\K[^"]+', 1)
    If Not @error Then ConsoleWrite($i + 1 & "nth: " & $aMatch[0] & @CRLF)
Next

; last match:
$aMatch = StringRegExp($sString, '"data":\h*"\K[^"]+(?!.+"data")', 1)
ConsoleWrite("last: " & $aMatch[0] & @CRLF)


However, I think it would be more elegant to returne all matches and then select them by index. Everything else only leads to a higher complexity of the patterns.
Or, as I said, treat it as JSON.

Link to comment
Share on other sites

16 hours ago, jugador said:

Now to get 1st or 2nd or 3rd match (...) how I do it using regex?

Many ways to do that. My 2 cents ...

Local $n = 1  ; 2  ; 3
Local $sRegex = '(?:.*?"data":"([^"]+)"){' & $n & '}'

Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'

Local $res = StringRegExp($sString, $sRegex, 1)[0]
Msgbox(0,"", $res)

Raw, should include error checking

Edit : simpler

Local $n = 2  ; 1  ; 3
Local $sRegex = '"data":"([^"]+)"'

Local $sString = '[[{"records":{"data":"random data"}}],[{"filtered":{"data":"unknown variable"}}],[{"duplicate":{"data":"not constant"}}],{"subject":"Test"}]'

Local $res = StringRegExp($sString, $sRegex, 3)
Msgbox(0, "", ($n > UBound($res) OR $n < 1) ? "error" : $res[$n-1])

 

Edited by mikell
Link to comment
Share on other sites

@mistersquirrle thanks but I am well aware of json Udf & _StringBetween. I open this thread to know how to get nth match using regex.

@pixelsearch using $aArray[0] or $aArray[1] more convenient method I guess :D

 

Now to get 1st match

@AspirinJunkie code work

^.*?"data":\K[^\]]+

also turning off Global flag work

Local $sRegex = '(?<={"data":)[^\]]+'
StringRegExp($sString, $sRegex, 2)

 

Now to get nth match (tested using https://regex101.com/ )

@AspirinJunkie code work on above sample data but getting Catastrophic backtracking on big data.

(?:"data":.*?){2}"data":\K[^\]]+

@mikell code work on big data

(?:.*?"data":([^\]]+)){2}\K[\]]

 

Edited by jugador
Link to comment
Share on other sites

1 hour ago, jugador said:

@pixelsearch using $aArray[0] or $aArray[1] more convenient method I guess :D

My goal was to display  the whole $aArray in case several capture groups are found in a pattern, in mode 1, for example :

#include <Array.au3>

Local $Subject = "Sunday Monday Tuesday Wednesday Thursday Friday Saturday"
Local $Pattern = "\b(\w+)(day)\b" ; word ending with 'day' and 2 capture groups
Local $Array = StringRegExp($Subject, $Pattern, $STR_REGEXPARRAYMATCH) ; mode 1
_ArrayDisplay($Array, "Mode 1 - two captures")

830981223_mode1-twocaptures.png.b218b9af376a59228431446e29a8dbf9.png

One needs to remember that even in Mode 1, when the 1st match is found, several elements of the array can be populated, depending on the number of capture groups found in the pattern.

This example is based on the different examples found in the french help file, topic StringRegExp, where examples and explanations are interesting too ( @jchd prepared this ? )

Edited by pixelsearch
typo
Link to comment
Share on other sites

I never worked on the French help file but as I look at it, it seems a fair translation of my english version.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...