Jump to content

Reg Exp help: .+(?>\R)


Recommended Posts

Hi all.   I'm revisiting poorly commented code of mine from over a year ago.  In one line I search a string for a regular expression and cannot figure out exactly what the code is searching for.  The expression is:

.+(?>\R)

I've tried to piece it together from the StringRegExp() page.  My educated guess is that it searches for newline characters in some capacity.  Here is what I have so far:

  • (?>\R):  The (?>...) indicates an atomic non-capturing group, meaning in-part that string matches are not recorded for later reference.  I'm not sure what the 'atomic' means though.  The description says that this 'locks,' which I'm also unclear on.  The \R matches any (Unicode) newline character.  So is this component somehow searching for new lines?
  • .+:  I'm not sure how these are modifying the above, nor exactly how they work together.  The . matches any single character except newline characters, unless (/S) is active.  How can I check if /S is active?  Would it be a parameter set in one of my options files?  And the + seems to match 1 or more.  This seems to make the preceding . redundant?

The complete line of code is:

$aArray = StringRegExp(_IEBodyReadText($oIE2), '.+(?>\R)', 3)

I'm reading body text from an HTML page.  I can successfully print out each element of this array.  Each element contains one line of text, followed by a blank line.  Hopefully that helps to confirm things.

Thanks in advance.

Link to comment
Share on other sites

PCRE is the regexp engine AutoIt uses. It's compiled with the PCRE_BSR_ANYCRLF option, meaning that \R means (?>\r\n|\n|\r) by default. You can change the meaning of \R by prefixing your pattern with one of the other (*BSR_....) option.

(?>...) is indeed atomic grouping, a non-capturing group made such that once matched, backtracking can't go in its middle. It's a match whole or nothing construct.

Since \R already involves an atomic group, it's pointless to enclose it in another atomic group. Hence your pattern boils down to .+\R which means one or more non-linebreak character followed by a newline sequence, aka a non empty line followed by a newline sequence.

I recommend http://regex101.com/ for testing, debugging and explaining PCRE regexps.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks for that.

>>  Since \R already involves an atomic group, it's pointless to enclose it in another atomic group.

That's what I was thinking.

>>  Hence your pattern boils down to...a non empty line followed by a newline sequence.

And the results--each element of the returned array contains one line of text, followed by a blank line--is consistent with this.  Thanks.things.

I have no idea where I got this expression.  The code is from over a year ago, but none of it looks familiar.  Atomic groups and newline sequences are things I've never used or learned before.  So I assumed I asked this forum for help some time ago and was given this black-box regular expression which I just plugged in.  But I (very hastily) looked through my old posts here and couldn't find anything.  Oh well.

On a side note, you posted your reply ~13 hours ago but I didn't receive an email notification.  My settings are such that I should have received one.  I also checked my spam filter, but nothing.  Should I make a new post about this, either in this forum or another?

edit:

1.  I'll try replacing the string with simply .+\R to verify that they are equivalent.

2.  Thanks for regex101.com--I'll check it out.

Edited by cag8f
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...