Baraoic

StringRegExp crash on subject string length > 20154

11 posts in this topic

I couldn't find any mention of this being a limitation so I'm asking to see if this is a bug. I have a script I use for remuxing videos and noticed AutoIt3.exe was crashing on some. I tracked it down to StringRegExp crashing when the subject string length was 20155 or more characters. It actually doesn't seem to crash 100% of the time when the string length exactly 20155 and doesn't work 100% of the time when its 20154, but does crash 100% when its say 20160 and always works when it's 20150 so it's a bit odd.

Anyways I attached a test text file and here is the code I was testing with as it might actually be the regex itself that's causing this.

FYI the code isn't actually suppose to find anything, later in the script I handle missing audio array.

$sInfo = StringLeft(FileRead(@ScriptDir & "\out.txt"), 20160)

$aAudio = StringRegExp($sInfo, '(?s)mkvmerge & mkvextract: (\d+)(?:.(?!A track))+Track type: audio(?:.(?!A track|Chapters))+Language: (\w+)', 3)

out.txt

Share this post


Link to post
Share on other sites



This regex currently doesn't work. What results is it supposed to return ?

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

This regex currently doesn't work. What results is it supposed to return ?

Correct, the regex isn't supposed to find anything. I later handle it not finding something, that isn't the problem. The problem is it is crashing when the $sInfo is > 20155. Is it not crashing for you?

Edited by Baraoic

Share this post


Link to post
Share on other sites

I never heard about such a limitation, so I would think that the problem comes from the regex

This example works nice with the whole file

#Include <Array.au3>
$sInfo = FileRead(@ScriptDir & "\out.txt")
$aUIDs = StringRegExp($sInfo, '(?s)(?:Track|Chapter)UID: (\d+)', 3)
 _ArrayDisplay($aUIDs)

As I couldn't test your regex and know nothing about the expected results I can't say more

Share this post


Link to post
Share on other sites

A regex should never cause AutoIt to crash, regardless if it finds something or not. My example is casing it to crash as shown in '>

 

Since you seem to really want the regex to return something here is a better example

This works:

#Include <Array.au3>
$sInfo = StringLeft(FileRead(@ScriptDir & "\out.txt"), 20150)
$aAudio = StringRegExp($sInfo, '(?s)mkvmerge & mkvextract: (\d+)(?:.(?!A track))+Track type: audio(?!(?:.(?!A track|Chapters))+Language: \w+)', 3)
_ArrayDisplay($aAudio)

This crashes and the only thing I changed is the string length:

#Include <Array.au3>
$sInfo = StringLeft(FileRead(@ScriptDir & "\out.txt"), 20160)
$aAudio = StringRegExp($sInfo, '(?s)mkvmerge & mkvextract: (\d+)(?:.(?!A track))+Track type: audio(?!(?:.(?!A track|Chapters))+Language: \w+)', 3)
_ArrayDisplay($aAudio)

Share this post


Link to post
Share on other sites

A regex may cause AutoIt to crash

The helpfile warns :

Caution: bad regular expressions can produce a quasi-infinite loop hogging the CPU, and can even cause a crash.

I could effectively reproduce the crash

All I can suggest is to format the pattern differently

Share this post


Link to post
Share on other sites

The change is obvious: your original pattern is backtracking as hell, hence is overflows PCRE stack at some point.

An easy change is to make the whole pattern ungreedy:

#Include <Array.au3>

$sInfo = FileRead(@ScriptDir & "\out.txt")
$aAudio = StringRegExp($sInfo, '(?sU)mkvmerge & mkvextract: (\d+)(?:.(?!A track))+Track type: audio(?!(?:.(?!A track|Chapters))+Language: \w+)', 3)
_ArrayDisplay($aAudio)

An even better tuning is to make the repetitions possessive.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

There has been an ongoing debate about what AutoIt should do with such bad patterns applied to large inputs, which may very well work at test stage but explode in the face of users while deployed in the field. It would require changing a PCRE compile-time option (use heap instead of limited-size stack [Windows doesn't offer provision to specify the stack size at laiunch time, contrary to Unix-like OSes]) to make bad patterns successful when fed with much larger input data.

But --and in my view this is a big but [pun intended]-- that also implies that bad patterns can be left undetected for very long. I'd rather see AutoIt catch the stack exception gracefully and return a standard @error code than crashing. Yet I'm not that much pro-heap, since that route would tends to make bad patterns much more common and possibly slow down a large number of applications using them silently (well, until a disaster happens).

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

A regex may cause AutoIt to crash

The helpfile warns :

Caution: bad regular expressions can produce a quasi-infinite loop hogging the CPU, and can even cause a crash.

I could effectively reproduce the crash

All I can suggest is to format the pattern differently

You're right, that warning is in there. I shouldn't have said never. I was just trying to figure out why it was crashing using my example, thanks for verifying it crashes for you as well.

 

The change is obvious: your original pattern is backtracking as hell, hence is overflows PCRE stack at some point.

An easy change is to make the whole pattern ungreedy:

#Include <Array.au3>

$sInfo = FileRead(@ScriptDir & "\out.txt")
$aAudio = StringRegExp($sInfo, '(?sU)mkvmerge & mkvextract: (\d+)(?:.(?!A track))+Track type: audio(?!(?:.(?!A track|Chapters))+Language: \w+)', 3)
_ArrayDisplay($aAudio)

An even better tuning is to make the repetitions possessive.

Didn't know backtracking could cause that, I will edit it to make it possessive. Thank you.

 

There has been an ongoing debate about what AutoIt should do with such bad patterns applied to large inputs, which may very well work at test stage but explode in the face of users while deployed in the field. It would require changing a PCRE compile-time option (use heap instead of limited-size stack [Windows doesn't offer provision to specify the stack size at laiunch time, contrary to Unix-like OSes]) to make bad patterns successful when fed with much larger input data.

But --and in my view this is a big but [pun intended]-- that also implies that bad patterns can be left undetected for very long. I'd rather see AutoIt catch the stack exception gracefully and return a standard @error code than crashing. Yet I'm not that much pro-heap, since that route would tends to make bad patterns much more common and possibly slow down a large number of applications using them silently (well, until a disaster happens).

Well it's not like I have any say in the matter, but it sounds like a debate on changing PCRE to not crash as much from an inefficient regex? Does using heap instead of limited-size cause any negative side effects ? If it does then I don't think it should be done, but if not then I'd say why not? Even if you get people (like me apparently ha) that use bad regex then it just means it slows down the script until it gets so bad it crashes. But it would work more often, which isn't that a good thing in general even if it isn't the best way?

Share this post


Link to post
Share on other sites

My issue with using the heap is that it silently encourages beginners (and more experienced users together) to write about any flawed pattern without thinking much about what they actually mean and do in practice. This is very bad for mastering regexp and their use. It also encourages flawed patterns to proliferate: I used that pattern in this AutoIt application and it works, so I'll use the same pattern in this new JavaScript piece of cake and in my new .NET app, almost forgot: I'll publish it on some regexp example repository to help others...

If none of these implementation crashes, then wrong patterns disseminate and routinely use lots of time, memory and resources without any benefit at all. Crashing less often is no better than crashing day one (on the contrary!). I favor correct programs in all cases. Don't forget that patterns are nothing less than source code for a specific target language.

Granted, I'd prefer AutoIt catching the stack overflow exception gracefully but this is beyond my realm.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Well it's not like you can stop bad programmers from doing that anyways, but that is a good point.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now