Jump to content
cag8f

Reg Exp help: .+(?>\R)

Recommended Posts

cag8f

Hi all.   I'm revisiting poorly commented code of mine from over a year ago.  In one line I search a string for a regular expression and cannot figure out exactly what the code is searching for.  The expression is:

.+(?>\R)

I've tried to piece it together from the StringRegExp() page.  My educated guess is that it searches for newline characters in some capacity.  Here is what I have so far:

  • (?>\R):  The (?>...) indicates an atomic non-capturing group, meaning in-part that string matches are not recorded for later reference.  I'm not sure what the 'atomic' means though.  The description says that this 'locks,' which I'm also unclear on.  The \R matches any (Unicode) newline character.  So is this component somehow searching for new lines?
  • .+:  I'm not sure how these are modifying the above, nor exactly how they work together.  The . matches any single character except newline characters, unless (/S) is active.  How can I check if /S is active?  Would it be a parameter set in one of my options files?  And the + seems to match 1 or more.  This seems to make the preceding . redundant?

The complete line of code is:

$aArray = StringRegExp(_IEBodyReadText($oIE2), '.+(?>\R)', 3)

I'm reading body text from an HTML page.  I can successfully print out each element of this array.  Each element contains one line of text, followed by a blank line.  Hopefully that helps to confirm things.

Thanks in advance.

Share this post


Link to post
Share on other sites
jchd

PCRE is the regexp engine AutoIt uses. It's compiled with the PCRE_BSR_ANYCRLF option, meaning that \R means (?>\r\n|\n|\r) by default. You can change the meaning of \R by prefixing your pattern with one of the other (*BSR_....) option.

(?>...) is indeed atomic grouping, a non-capturing group made such that once matched, backtracking can't go in its middle. It's a match whole or nothing construct.

Since \R already involves an atomic group, it's pointless to enclose it in another atomic group. Hence your pattern boils down to .+\R which means one or more non-linebreak character followed by a newline sequence, aka a non empty line followed by a newline sequence.

I recommend http://regex101.com/ for testing, debugging and explaining PCRE regexps.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
cag8f

Thanks for that.

>>  Since \R already involves an atomic group, it's pointless to enclose it in another atomic group.

That's what I was thinking.

>>  Hence your pattern boils down to...a non empty line followed by a newline sequence.

And the results--each element of the returned array contains one line of text, followed by a blank line--is consistent with this.  Thanks.things.

I have no idea where I got this expression.  The code is from over a year ago, but none of it looks familiar.  Atomic groups and newline sequences are things I've never used or learned before.  So I assumed I asked this forum for help some time ago and was given this black-box regular expression which I just plugged in.  But I (very hastily) looked through my old posts here and couldn't find anything.  Oh well.

On a side note, you posted your reply ~13 hours ago but I didn't receive an email notification.  My settings are such that I should have received one.  I also checked my spam filter, but nothing.  Should I make a new post about this, either in this forum or another?

edit:

1.  I'll try replacing the string with simply .+\R to verify that they are equivalent.

2.  Thanks for regex101.com--I'll check it out.

Edited by cag8f

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • therks
      By therks
      I'm looking for a regex genius, cus I'm stumped when it comes to assertions.
      So what I have now, is this regular expression: ([^|=]+)=([^|]+)
      It takes a string (user input) of keys=values separated by pipes (ie: "param=value|param=value") and splits them into an array.
      Example:
      $vParamData = 'example=value|fruit=apple|phrase=Hello world' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; Result ; [0] => example ; [1] => value ; [2] => fruit ; [3] => apple ; [4] => phrase ; [5] => Hello world So that's working fine, but I'm wondering if there's also a way I could have this capture escaped pipes instead of splitting by them.
      ie:
      $vParamData = 'pipe test=this \| is a pipe|example=value' $aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3) ; I'm getting this: ; [0] => pipe test ; [1] => this \ ; [2] => example ; [3] => value ; But I'd like a result like this: ; [0] => pipe test ; [1] => this \| is a pipe ; [2] => example ; [3] => value Is there some pattern that would accomplish this, or am I better off parsing it some other way?
    • Chimp
      By Chimp
      regex and iso escape sequences
      Hi, I would like to extract all ISO escape squences embedded in a string and separate them from the rest of the string, still keeping the information about their position, so that, for exemple, a string like this one (or even more complex):
      (the string could start with normal text or iso sequences)
       
      '\u001B[4mUnicorn\u001B[0m' should be 'transformed' in an array like this
      $a[0] = '\u001B[4m' ; first iso escape sequence $a[1] = 'Unicorn' ; normal text $a[2] = '\u001B[4m' ; second iso escape sequence ... and so on (note: the above escape sequence has 'control codes' marked as "\u001B' for the asc "esc" char for exemple and a similar notation is also used for other control chars, but in the real string to be parsed those control chars  are embedded  as a single byte with a value from 01 to 31). at this link (http://artscene.textfiles.com/ansi/) there are many example of real ANSI text files .
      searching on the web I've found some possible solutions that make use of regexp to achieve similar purpose, and above some others, the regexp pattern posted in the following link by kfir (https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python) seems to be able to catch a wider range of ISO escape sequences (not only color sequences), but my lack of skills on regexp, prevents me from evaluating and testing such patterns
      I would be very grateful if some regexp guru could come to my rescue...
      thanks everybody  for reading...
    • ur
      By ur
      I am trying to identify the window based on the window title and text.
      The title will be the "erwin DM - filename"

      It is working till date, but some operating systems our application is displaying window as "erwin DM - [filename]"
       
      I tried  "erwin DM - *filename*" But this regular expression is not working.
      Any suggestion?
       
      $sModelFile = "C:\Users\Administrator\Documents\My Models\eMovies.erwin" $wdModel = _WinWaitActivate1("erwin DM - "&FileNameOnly($sModelFile),"") Func _WinWaitActivate1($title,$text,$timeout=0);Will Return the window Handler Logging("Waiting for "&$title&":"&$text) $dHandle = WinWait($title,$text,$timeout) if not ($dHandle = 0) then If Not WinActive($title,$text) Then WinActivate($title,$text) return WinWaitActive($title,$text,$timeout) Else Logging("Timeout occured while waiting for the window...") Exit EndIf EndFunc Func FileNameOnly($sFilePath) Local $sDrive = "", $sDir = "", $sFileName = "", $sExtension = "" Local $aPathSplit = _PathSplit($sFilePath, $sDrive, $sDir, $sFileName, $sExtension) ;_ArrayDisplay($aPathSplit, "_PathSplit of " & @ScriptFullPath) return $sFileName EndFunc  
    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.
      Function:
      Match Match of arrays Match and replace Load source data from website Load source data from a website with GET/POST Load text data from file Clear fields Export and Import settings (you can finish the expression a other time, just export/import it) Cheat sheet Generate AutoIt code The source code is not difficult and I think most user will understand it.
      In the zip file there are 2 export files (POST and a reg back example), you can drag and drop these files on the gui to import them.
      Download Regex Toolkit here Regex toolkit.zip
      EDIT: Updated to version V1.2.0
      Changes are:
      Expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) Usefull regular expressions websites links included in the prgram Text data update time EDIT: Updated to version V1.3.0
      Changes are:
       Automatic generate AutoIt code  Icons on the tab  Few minor bug fixes EDIT: Updated to version V1.4.0
      Changes are:
      Link to AutoIt regex helpfile If the regular expression has a error than the text becomes red Option Offset with Match and array of Matches Option Count with Match and replace Some small minor bug fixed
    • nikink
      By nikink
      Hi all, it's been a while since I last used regular expressions and I find myself out of time to experiment with this particular issue, so I throw myself upon your mercy and expertise.
      I am looking to create a function that will say whether or not a supplied string is a valid UUID or not.
      Local $sTestF = '4C4C4544-004A-4C10-8054-B7C04F46343' Local $sTestT = '4C4C4544-004A-4C10-8054-B7C04F463432' ConsoleWrite('False = ' & _IsValidUUID($sTestF) & @CRLF) ConsoleWrite('True = ' & _IsValidUUID($sTestT) & @CRLF) Func _IsValidUUID($sUUID) ;[\p{XDigit}]{8}-[\p{XDigit}]{4}-[34][\p{XDigit}]{3}-[89ab][\p{XDigit}]{3}-[\p{XDigit}]{12} ; Test UUID = '4C4C4544-004A-4C10-8054-B7C04F463432' Local $sRegExp = '([:xdigit:]){8}\-([:xdigit:]){4}\-([34])([:xdigit:]){3}\-([89ab])([:xdigit:]){3}\-([:xdigit:]){12}' ConsoleWrite(StringRegExp($sUUID, $sRegExp) & @CRLF) Local $Result = StringRegExp($sUUID, $sRegExp) ConsoleWrite($Result & @CRLF) If @error Then ConsoleWrite('Error: [' & @error & ']' & @CRLF) Return 'False' Else ConsoleWrite('Error2: [' & @error & ']' & @CRLF) Return 'True' EndIf EndFunc In the line under the Function call, you'll see the regex I found to do this from a google search. That was my starting point, and I'm trying to get it to work in Au3 and failing miserably.
      $sTestF is a known invalid String
      $sTestT is a known valid String
      Everything I've tried so far has produced the same results for both.
      Any help you could provide me is greatly appreciated. Thanks for your time!
×