Hi all.   I'm revisiting poorly commented code of mine from over a year ago.  In one line I search a string for a regular expression and cannot figure out exactly what the code is searching for.  The expression is:


I've tried to piece it together from the StringRegExp() page.  My educated guess is that it searches for newline characters in some capacity.  Here is what I have so far:

  • (?>\R):  The (?>...) indicates an atomic non-capturing group, meaning in-part that string matches are not recorded for later reference.  I'm not sure what the 'atomic' means though.  The description says that this 'locks,' which I'm also unclear on.  The \R matches any (Unicode) newline character.  So is this component somehow searching for new lines?
  • .+:  I'm not sure how these are modifying the above, nor exactly how they work together.  The . matches any single character except newline characters, unless (/S) is active.  How can I check if /S is active?  Would it be a parameter set in one of my options files?  And the + seems to match 1 or more.  This seems to make the preceding . redundant?

The complete line of code is:

$aArray = StringRegExp(_IEBodyReadText($oIE2), '.+(?>\R)', 3)

I'm reading body text from an HTML page.  I can successfully print out each element of this array.  Each element contains one line of text, followed by a blank line.  Hopefully that helps to confirm things.

Thanks in advance.

PCRE is the regexp engine AutoIt uses. It's compiled with the PCRE_BSR_ANYCRLF option, meaning that \R means (?>\r\n|\n|\r) by default. You can change the meaning of \R by prefixing your pattern with one of the other (*BSR_....) option.

(?>...) is indeed atomic grouping, a non-capturing group made such that once matched, backtracking can't go in its middle. It's a match whole or nothing construct.

Since \R already involves an atomic group, it's pointless to enclose it in another atomic group. Hence your pattern boils down to .+\R which means one or more non-linebreak character followed by a newline sequence, aka a non empty line followed by a newline sequence.

#3 ·  Posted

Thanks for that.

>>  Since \R already involves an atomic group, it's pointless to enclose it in another atomic group.

That's what I was thinking.

>>  Hence your pattern boils down to...a non empty line followed by a newline sequence.

And the results--each element of the returned array contains one line of text, followed by a blank line--is consistent with this.  Thanks.things.

I have no idea where I got this expression.  The code is from over a year ago, but none of it looks familiar.  Atomic groups and newline sequences are things I've never used or learned before.  So I assumed I asked this forum for help some time ago and was given this black-box regular expression which I just plugged in.  But I (very hastily) looked through my old posts here and couldn't find anything.  Oh well.

On a side note, you posted your reply ~13 hours ago but I didn't receive an email notification.  My settings are such that I should have received one.  I also checked my spam filter, but nothing.  Should I make a new post about this, either in this forum or another?


1.  I'll try replacing the string with simply .+\R to verify that they are equivalent.

2.  Thanks for'll check it out.

Edited by cag8f

    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.

      - Match - Match of arrays - Match and replace - Load source data from website - Load source data from a website with POST - Load text data from file - Clear fields - Export and Import settings (you can finish the expression a other time, just export/import it) - Cheat sheet The source code is not difficult and I think most user will understand it.
      This program does need the winhttp udf (it's also include in the zip file)
      In the zip file there is also a export file (POST example), this is a example of website source code with POST.
      You can download it here Regex
      EDIT: Updated to version V1.2.0
      Change are:
      - expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) - usefull regular expressions websites links included in the prgram - text data update time
    • nikink
      By nikink
      Hi all, it's been a while since I last used regular expressions and I find myself out of time to experiment with this particular issue, so I throw myself upon your mercy and expertise.
      I am looking to create a function that will say whether or not a supplied string is a valid UUID or not.
      Local $sTestF = '4C4C4544-004A-4C10-8054-B7C04F46343' Local $sTestT = '4C4C4544-004A-4C10-8054-B7C04F463432' ConsoleWrite('False = ' & _IsValidUUID($sTestF) & @CRLF) ConsoleWrite('True = ' & _IsValidUUID($sTestT) & @CRLF) Func _IsValidUUID($sUUID) ;[\p{XDigit}]{8}-[\p{XDigit}]{4}-[34][\p{XDigit}]{3}-[89ab][\p{XDigit}]{3}-[\p{XDigit}]{12} ; Test UUID = '4C4C4544-004A-4C10-8054-B7C04F463432' Local $sRegExp = '([:xdigit:]){8}\-([:xdigit:]){4}\-([34])([:xdigit:]){3}\-([89ab])([:xdigit:]){3}\-([:xdigit:]){12}' ConsoleWrite(StringRegExp($sUUID, $sRegExp) & @CRLF) Local $Result = StringRegExp($sUUID, $sRegExp) ConsoleWrite($Result & @CRLF) If @error Then ConsoleWrite('Error: [' & @error & ']' & @CRLF) Return 'False' Else ConsoleWrite('Error2: [' & @error & ']' & @CRLF) Return 'True' EndIf EndFunc In the line under the Function call, you'll see the regex I found to do this from a google search. That was my starting point, and I'm trying to get it to work in Au3 and failing miserably.
      $sTestF is a known invalid String
      $sTestT is a known valid String
      Everything I've tried so far has produced the same results for both.
      Any help you could provide me is greatly appreciated. Thanks for your time!
    • sjaikumar
      By sjaikumar
      I am looking to writing an automation script for converting the following SQL procedure code into VBCode as shown below
      ALTER PROCEDURE [dbo].[firstprocedure] (        @var1 varchar(10),        @var2 varchar(7),        @var3 float )    CONVERSION Public Function firstprocedure(ByVal var1 As String, ByVal var2 As String, ByVal var3 As Integer) As DataSet         Dim ds As New DataSet()         '**************query  with stored procedure**********         Dim CMD As New SqlCommand("GetCountOfTempGramWtsGwByFoodCodeProgressAndNewSequence")         CMD.Connection = GetConnection()         CMD.Parameters.Add("@var1", SqlDbType.VarChar).Value = var1         CMD.Parameters.Add("@var2", SqlDbType.VarChar).Value = var2         CMD.Parameters.Add("@var3", SqlDbType.Float).Value = var3         CMD.CommandType = CommandType.StoredProcedure         Dim adapter As New SqlDataAdapter(CMD)  

      I will be reading the procedural code from the first file that has to be read and create the VB code by writing onto a new file.
      My approach is that I need the following information captured in Variables which I can insert later onto the new file as and where applicable.
      In order to do that I need to extract the following bit of information from the file to be READ
      Name of procedure : firstprocedure
      List of Variables : @var1, @var2, @var3  
      Data Types: varchar, varchar & float
      What I need help with is extracting the list of variables and data types in separate variables.
      I am looking to build a Regular Expression which I can use to achieve the same. 
      I tried making use of StringSplit function delimited on spaces(" ") but that did not work when reading the file from notepad. I reckon it does not detect spaces in the file.
      Please help me with the RegExp.
      Any other suggestions on how best to go about doing this conversion are also welcome.
      Thank You
    • VIP
      By VIP
      ;Global $sCharAllowed = ";!#$%&'()+-,.0123456789=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{ }~" Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-" Local $lStringOUT = StringRegExpReplace($uStringIN, '[[:^print:]]', "_") ConsoleWrite($lStringOUT&@CRLF) jguinch
      Global $sCharAllowed = "\w;!#$%&'\(\)+,-.=@\[\]^`{ }~" Local $uStringIN = "頹-衙-浳-浤-搰-橱-煪๏๐๑๒๖D๚๛ẀẁẂẴẵẶặẸẹẺẻỈỉỊAịỌÆ¢™®¯ÌÍÎÏÐÑ×ØÙòóôŘřŢţŷOŸǾǿ₪₫€℅l№™Ωe⅛∑-•v8∫˜≠==□ּׂאַאָאּבּVגּדּהּוּזטּיּךּכשּתּוֹבֿכֿפֿAﭏﭽﮊﮋﮏﮐﮑﮒﮓﮔﺅﺆﺇTﺈﺉﺊﺋﺛﺜﺝﺮﺯﺰﺱﻐRﻑﻒﻓﻔﻕﻟﻠﻡﻢﻣﻤOﻥﻦﻧﻨﻰﻱﻲﻳﻴﻵﻶNﻷﻸﻹﻺﻻﻼْ%-㍱G-煱-둻-睤-㌹-" $replace = StringRegExpReplace($uStringIN, "[^" & $sCharAllowed& "]", "_") MsgBox(0, "", @CRLF & $replace & @CRLF)  
    • GPinzone
      By GPinzone
      I wrote a program to convert single and double quotes to curly single and double quotes. The program uses regular expression search and replaces to swap the straight quotes to their curly counterparts and to ignore any HTML tags, too.
      #include <ButtonConstants.au3>
      #include <EditConstants.au3>
      #include <GUIConstantsEx.au3>
      #include <WindowsConstants.au3>
      #include <FontConstants.au3>

      Func Curlify($sInput)
          Local $sOutput = StringRegExpReplace($sInput, "'(?!([^<]+)?>)", "’")
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")
          $sOutput = StringRegExpReplace($sOutput, '"(?!([^<]+)?>)', "”")
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)”", "${1}“")
          ; Fix nested quotes
          While StringRegExp($sOutput, "(‘|“)(<[^>]*>)*(’|”)")
              $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)’", "${1}‘")
              $sOutput = StringRegExpReplace($sOutput, "((‘|“)(<[^>]*>)*)”", "${1}“")
          Return $sOutput
      EndFunc   ;==>Curlify
      $FormCurly = GUICreate("Form_Curly", 708, 439, 192, 124)
      $Original = GUICtrlCreateEdit("", 0, 0, 305, 433)
      GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
      GUICtrlSetLimit(-1, 1500000)
      $Modified = GUICtrlCreateEdit("", 408, 0, 297, 433)
      GUICtrlSetFont(-1, 9, $FW_NORMAL, 0, "Courier New") ; Set the font of the previous control.
      GUICtrlSetLimit(-1, 1500000)
      $Curly = GUICtrlCreateButton("Curly", 320, 64, 75, 25)
      $Reset = GUICtrlCreateButton("Reset", 320, 164, 75, 25)
      While 1
          $nMsg = GUIGetMsg()
          Switch $nMsg
              Case $Curly
                  GUICtrlSetData($Modified, Curlify(GUICtrlRead($Original)))
              Case $Reset
                  GUICtrlSetData($Original, "")
                  GUICtrlSetData($Modified, "")
              Case $GUI_EVENT_CLOSE
      So why am I posting a question about a program that works?  This regex:
          $sOutput = StringRegExpReplace($sOutput, "((^|\s+)(<[^>]*>)*)’", "${1}‘")
      should be able to be rewritten without the "+":
          $sOutput = StringRegExpReplace($sOutput, "((^|\s)(<[^>]*>)*)’", "${1}‘")
      Same goes for the one to do double quotes, obviously.
      However, if I remove the "+" from the "\s" in the regex, the program will fail in some cases. Something like <b>’<i>Don’t</i>’</b> at the beginning of a line won't get fixed. I suspect it's got something to do with the fact the the previous character (aside from the HTML tags) is a newline. The "\s" should work just fine with newlines. I'm stumped because the regex without the "+" modifier works fine on when I test it.