Jump to content

SRER Invalid Pattern


kylomas
 Share

Recommended Posts

The following fails when the quantifier is greater than 9360.

#include <array.au3>

;---------------------------------------------------------------------------------
; setup test file
local $sfn = @scriptdir & '\test.txt', $str
local $hfl = fileopen($sfn,2)
if $hfl = -1 then exit(msgbox(0,'ERROR','File open failed for file = ' & $sfn))
for $1 = 1 to 100000
    $str &= 'Line # ' & $1 & @CRLF
Next
filewrite($hfl, $str)
fileclose($hfl)
;---------------------------------------------------------------------------------

_FileDeleteLine($sfn,9360,true)

func _FileDeleteLine($sFile, $iLine, $bBlank)

    local $sTmp = fileread($sFile)
    local $hfl2 = fileopen($sFile,2)
    if $hfl2 = -1 then exit(msgbox(0,'ERROR','File open failed for file = ' & $sFile))
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:.*?\R){'& $iLine-1 & '}).*$','\1')
    ConsoleWrite(@error & ' - ' & @extended & @CRLF)
    $sEnd = stringregexpreplace($sTmp,'(?ms)^(?:.*?\R){' & $iLine & '}(.*)$','\1')
    ConsoleWrite(@error & ' - ' & @extended & @CRLF)
    filewrite($hfl2, ($bBlank) ? $sBeg & @CRLF & $sEnd : $sBeg & $sEnd)
    fileclose($hfl2)

endfunc

The idea was to split a file in 2 based on input from the user ($iLine) and reconstruct the file with either a blank line or the line deleted.  I wanted to see if I could figure out how to do this using an SRER pattern but it seems that I've run in to some kind of limit. 

Any suggestions for a better pattern are appreciated.

kylomas

 

 

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Your script runs on my PC without any errors.

>"C:\Program Files (x86)\AutoIt3\SciTE\AutoIt3Wrapper\AutoIt3Wrapper.exe" /run /prod /ErrorStdOut /in "F:\AppDev\_TEST TOOL DEVELOPMENT CODE_\AutoIt3Data\_SCRIPTS\Training\Temp1.au3" /UserParams    
+>10:27:14 Starting AutoIt3Wrapper v.2.2.0.3 SciTE v.3.4.1.0   Keyboard:00000809  OS:WIN_7/Service Pack 1  CPU:X64 OS:X64    Environment(Language:0409)
+>         SciTEDir => C:\Program Files (x86)\AutoIt3\SciTE   UserDir => C:\Users\Multiple Monitors\AppData\Local\AutoIt v3\SciTE\AutoIt3Wrapper   SCITE_USERHOME => C:\Users\Multiple Monitors\AppData\Local\AutoIt v3\SciTE 
>Running AU3Check (3.3.12.0)  from:C:\Program Files (x86)\AutoIt3  input:F:\AppDev\_TEST TOOL DEVELOPMENT CODE_\AutoIt3Data\_SCRIPTS\Training\Temp1.au3
+>10:27:14 AU3Check ended.rc:0
>Running:(3.3.12.0):C:\Program Files (x86)\AutoIt3\autoit3.exe "F:\AppDev\_TEST TOOL DEVELOPMENT CODE_\AutoIt3Data\_SCRIPTS\Training\Temp1.au3"    
--> Press Ctrl+Alt+F5 to Restart or Ctrl+Break to Stop
0 - 1
0 - 1
+>10:27:15 AutoIt3.exe ended.rc:0
+>10:27:15 AutoIt3Wrapper Finished.
>Exit code: 0    Time: 0.4641

$sBeg and $sEnd each contain the expected parts of the file

On checking the output file line 9360 has been replaced by a CRLF which I don't think is what you are wanting. To just insert a extra CRLF between lines 9360 and 9361 change this line:

    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:.*?R){'& $iLine - 1& '}).*$','1')
to

    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:.*?R){'& $iLine & '}).*$','1')
 

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Bowmore,

Thanks for running this mess.  It runs with 9360 on my PC also.  Try changing it to 9500.

The script is not supposed to insert a line.  It takes whatever line is specified in $iLine and either deletes it or changes it to blank depending on the value of $bBlank.

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

kylomas,

I see what the your issue is now. I've tested your expressions in RegexBuddy configure for PECR 8.34 and there seem to be no problems with the expressions. I tried with values for $iline of between 100 and 99999 and the groups representing of the beginning and end of the file were captured correctly.
 

It would seem that there is a limit or bug in the Autoit implementation of PCRE that limits the number of repetitions of a group (){n} the number of repetitions allowed has some relationship to the size of the data captured in the group that is repeated. The simple test below give some idea of the relationship between the size of the group and the number of repetitions based a sample file containing only @LFs

#include <array.au3>

;---------------------------------------------------------------------------------
; setup test file
local $sfn = @scriptdir & '\test.txt', $str
local $hfl = fileopen($sfn,2)
if $hfl = -1 then exit(msgbox(0,'ERROR','File open failed for file = ' & $sfn))
for $1 = 1 to 100000
    $str &= @LF
Next
filewrite($hfl, $str)
fileclose($hfl)
;---------------------------------------------------------------------------------

_RegExTest($sfn)

func _RegExTest($sFile)

    local $sTmp = fileread($sFile)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n){10920}).*$','\1')
    ConsoleWrite('1 {10920} PASS ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n){10921}).*$','\1')
    ConsoleWrite('1 {10921} FAIL ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n){8190}).*$','\1')
    ConsoleWrite('2 {8190} PASS ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n){8191}).*$','\1')
    ConsoleWrite('2 {8191} FAIL ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n){6552}).*$','\1')
    ConsoleWrite('3 {6552} PASS ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n){6553}).*$','\1')
    ConsoleWrite('3 {6553} FAIL ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n\n){5460}).*$','\1')
    ConsoleWrite('4 {5460} PASS ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n\n){5461}).*$','\1')
    ConsoleWrite('4 {5461} FAIL ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n\n\n){4680}).*$','\1')
    ConsoleWrite('5 {4680} PASS ' & @error & ' - ' & @extended & @CRLF)
    $sBeg = stringregexpreplace($sTmp,'(?ms)^((?:\n\n\n\n\n){4681}).*$','\1')
    ConsoleWrite('5 {4681} FAIL ' & @error & ' - ' & @extended & @CRLF)

endfunc

The output I get is:

1 {10920} PASS 0 - 1
1 {10921} FAIL 2 - 25
2 {8190} PASS 0 - 1
2 {8191} FAIL 2 - 26
3 {6552} PASS 0 - 1
3 {6553} FAIL 2 - 28
4 {5460} PASS 0 - 1
4 {5461} FAIL 2 - 30
5 {4680} PASS 0 - 1
5 {4681} FAIL 2 - 32


 

Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

There is a recursion limit set in place within AutoIt.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

I started an answer some time ago but postponned it until now.

First I must say that there is a strong argument against using the stack for PCRE recursion. Indeed in a semi-private conversation, trancexx advocated for switching to the use of a heap (a PCRE compile-time option) to avoid such barrier. I must stress that here I'm talking about a sub-pattern recursion limit. OTOH I said that even a hard crash has some benefit in pointing out bad patterns which are likely to burst in applications at production time when dealing with exceptionaly large input when untold heavy backtracking is taking place.

Also, the absolute limit inherent to current PCRE is the allowable range of repetitions in [0..65535].

Now the patterns shown until there have a few issues. Options (?ms) are not only useless with contructs like (?:.*R) with some repetition factor but also conter-productive. Making dot match R (where dot matches R as well) only invites terrible backtracking. Removing these useless options is both clearer and simpler.

I thought that this would do:

;---------------------------------------------------------------------------------
; setup test file
Local $sfn = @ScriptDir & '\test.txt', $str
Local $hfl = FileOpen($sfn, 2)
If $hfl = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sfn))
For $1 = 1 To 10^4
    $str &= 'Line # ' & $1 & @CRLF
Next
FileWrite($hfl, $str)
FileClose($hfl)
;---------------------------------------------------------------------------------

_FileDeleteLine($sfn, 9999, True)

Func _FileDeleteLine($sFile, $iLine, $bBlank)

    Local $sTmp = FileRead($sFile)
    Local $hfl2 = FileOpen($sFile & 'out', 2)
    If $hfl2 = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sFile))
    Local $pat = '(?:.*\R){' & $iLine - 1 & '}\K.*\R(?s)(.*)'
ConsoleWrite($pat & @LF & '1...5...10...15...20...25...30...35...40...45...50...55' & @LF)
    $sTmp = StringRegExpReplace($sTmp, $pat, (($bBlank) ? @CRLF : '') & '\1')
    ConsoleWrite(@error & ' - ' & @extended & @CRLF)
    FileWrite($hfl2, $sTmp)
    FileClose($hfl2)

EndFunc   ;==>_FileDeleteLine

Yet PCRE overflows its stack beyond some value of a bounded repetition value of (?:.*R). This is surprising and most likely an issue within PCRE. Until I'm reading the docs wrong, stack (or heap) recursion is only used with unbounded (with * or + or such) repetition factors, where the engine must keep track of successive repetition individually to allow the rest of the pattern to match and eventually backtrack from the point reached until now if it doesn't.

I've asked the question on the pcre list, awaiting Philip answer.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@jchd
Thanks for your input,
This is turning into a very interesting topic. I've been using and learning RegEx for many years in several different environments that all have slight differences syntax implemented. I'm beginning to seen even different implementations of a library such as PCRE can have different outcomes for the same expression. The more I learn the more I discover I how little I know. I think I will need to replace some of the error checking is some of my code as I have tended to remove some of the error checking on RegExs once I was happy the the pattern was returning the result I expected.

This is turning into an interesting topic.

 

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

PCRE is a very strong piece of code and, as you know, [one of] the most powerful RE engine around. It's written in pure C for portability on about any platform you can think of and includes two regexp engines (DFA and NFA which AutoIt uses) and two engines: regular compiler and JIT (Just In Time, the fastest).

PCRE also has a number of useful pattern optimizations to produce efficient execution by bytecode or JIT but it can't spend the same time and carry the complex code that high-level C/C++/whatever compilers can afford, because it's an embeddable library which must be kept relatively small compared to the engines included in products like Perl, PHP, JS, .NET, ..., where code size doesn't really matter. Yet compatibility with its model (Perl) is excellent.

Also there are a large number of compile-time options and user-definable limits which can impact how the engine(s) behave(s) in corner cases.

PCRE tries hard to simplify "viral" backtracking when possible in many simple cases but it doesn't use overly complex code to do so, as we all expect the pattern compilation time to be minimal.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Some news about the original question.

When compiling fixed repetition of subpatterns with more than one character, PCRE produces a repetition of the subpattern. I.e. (?:abc-){5} is converted in bytecode into the equivalent of (?:abc-abc-abc-abc-abc-) and this method produces bytecode space overflow for large repetition factor, unless a larger "linking option" is used at library compile-time (link-size=3 or 4).

Use of the JIT engine doesn't help here since the pattern is first compiled into bytecode, then only reprocessed by JIT  (which doesn't use the same mtehods). So JIT or not we hit the bytecode size barrier with the current link-size in force for AutoIt (2 by default). Larger values slow down the engine somehow and this is probably why we are using the value 2.

Would it be possible for PCRE to use a simple loop mecanism to overcome this large expansion? No, but maybe Yes!

First, in simplest examples like (?:abc-){5} it is indeed possible to enclose the whole thing in a bytecode loop and treat it as a whole: no backtracking possible here, hence either the baby matches or does not.

But this is only part of the picture. With vicious patterns and subject pairs, like matching (?:aa|a){6} to aaaaaa, there is now a need to possibly backtrack at various points inside the iteration, which involves digging information burried deep into the stack to keep track of the iterator value. Pathological patterns would make this a real nightmare. So the rough idea isn't practical at least as is.

Yet Philip Hazel (PCRE author and team leader) identifies here a possible evolution of the code in PCRE2, the brand new version of PCRE which offers an improved interface and, I'm sure, AutoIt will switch to in some future.

For those wanting to read the discussion in full, point your browser to this thread.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@jchd
Well, it looks like you got the ball rolling thanks to kylomas - good job.  :thumbsup:

So, until AutoIt starts using PCRE2, this example works for me.  ;)

#include <array.au3>

;---------------------------------------------------------------------------------
; setup test file
Local $sfn = @ScriptDir & '\test.txt', $str
Local $hfl = FileOpen($sfn, 2)
If $hfl = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sfn))
For $1 = 1 To 100000
    $str &= 'Line # ' & $1 & @CRLF
Next
FileWrite($hfl, $str)
FileClose($hfl)
;---------------------------------------------------------------------------------
ConsoleWrite(Hex(9359) & @LF)
_FileReplaceLine($sfn, 9360, "######" & @CRLF) ;
Sleep(1000)
ShellExecute($sfn)
Sleep(1000)
FileDelete($sfn)


Func _FileReplaceLine($sFile, $iLine, $sReplacement = @CRLF) ; To delete line completely, have $sReplacement = ""
    Local $sTmp = FileRead($sFile)
    Local $hFile = FileOpen($sFile, 2)
    If $hFile = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sFile))
    If $iLine > 9359 Then
        Local $sNewText = StringLeft($sTmp, StringInStr($sTmp, @LF, 0, $iLine - 1)) & $sReplacement & StringTrimLeft($sTmp, StringInStr($sTmp, @LF, 0, $iLine))
    Else
        Local $sNewText = StringRegExpReplace($sTmp, "((?:\V*\R){" & ($iLine - 1) & "})(\V*\R)(?s)(.*$)", "${1}" & $sReplacement & "${3}") ;
    EndIf
    FileWrite($hFile, $sNewText)
    FileClose($hFile)
EndFunc   ;==>_FileReplaceLine
Link to comment
Share on other sites

This one works too  :D

For the fun only, because it's veeery slow

The concept is interesting though

Based on a regex by jguinch, thanks to him for this idea of a "loop inside a SRER"

;---------------------------------------------------------------------------------
; setup test file
Local $sfn = @ScriptDir & '\test.txt', $str
Local $hfl = FileOpen($sfn, 2)
If $hfl = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sfn))
For $1 = 1 To 100000
    $str &= 'Line # ' & $1 & @CRLF
Next
FileWrite($hfl, $str)
FileClose($hfl)
;---------------------------------------------------------------------------------

_FileReplaceLine($sfn, 99998, "######" & @CRLF) 

Sleep(1000)
ShellExecute($sfn)
Sleep(1000)
FileDelete($sfn)


Func _FileReplaceLine($sFile, $iLine, $sReplacement = @CRLF)
    Local $sTmp = "'" & FileRead($sFile) & "'", $iReplace 
    Local $hFile = FileOpen($sFile, 2)
    If $hFile = -1 Then Exit (MsgBox(0, 'ERROR', 'File open failed for file = ' & $sFile))
    Local $sNewText = Execute( StringRegExpReplace($sTmp, "(?<!^)(\V*\R)", _ 
        "' & (((Assign(""iReplace"", Eval(""iReplace"")+1) * Eval(""iReplace"")) = $iLine) ? $sReplacement : '$1') & '") )
    FileWrite($hFile, $sNewText)
    FileClose($hFile)
EndFunc   ;==>_FileReplaceLine
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...