Jump to content

Regular expression challenge for stripping single comments.


guinness
 Share

Go to solution Solved by DXRW4E,

Recommended Posts

@OffTopic

Malkey, I did not understand the point, however i am Albanian and the non-native language that I speak is the Italian, I write English using Google Translator, respect etc etc (Italian ect ect, so ect ect comes from Italian) i think they are a little international, as the "ciao, salut, adios, aloha" for this reason I do not pay much attention to them, because I believe that almost 99.9% of users already understand them

Ciao.

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

DXRW4E, thanks for these useful various examples

This thread looks more and more like a tutorial  :)

Just a little bit.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

I was bored and came across this topic and it got me back to thinkin about things in regular expressions again, which is always an interesting challenge..  funny thing is, through multiple revisions my near-final PCRE expression wound up looking alot like what was in the original post, yet I hadn't used it as a reference point (and I actually failed to get the right result)...

One of the initial expressions I tried, which I was sure should have worked, was this:

^([^'";]*)([^'";]*(['"])[^\g-1]*?\g-1)*;.*$

Unfortunately, for some reason I can't understand, the [^g-1]*? part of the expression becomes a little greedy when there is a * (star) outside of the surrounding parentheses.

In other words, a single or double quote would be matched, then the [^g-1]*? part would capture anything except the same quote type caught (g-1 here simply means the last capture group) under normal circumstances, but when there is that * ouside of parentheses (after g-1) ), it suddenly becomes greedy even though its specified with the ? to be non-greedy. Very confusing!

I even tried using the 'DEFINE' feature like such:

(?(DEFINE)(?<qstr>(['"])[^\g-1]*?\g-1))^([^'";]*)([^'";]*(?&qstr))*;.*$

Here I was hoping this would fool the PCRE engine into accepting the non-greedy qualifier correctly.  But NOPE!  Wrong again.

I also tried splitting up the quote-skip into 2 parts as well:

(?:'[^']*?'|"[^"]*?"))*

Still it was greedy mother..

Finally, I came back here and took a look at how it was done in the original post and realized the second capture group was stepping through the character set [^'";] one character at a time and for each non-match it looked at the other parts of the expression. So I finally just went that route and plopped in some of my original code to get this:

^([^'";]*)(?:[^'";]|(['"])[^\g-1]*\g-1)*(;.*)$

So basically, nothing much has changed, except now its possibly slower since it captures a group instead of splits it up with |'s.

Oh well, you live, you learn.  I'd still love it if someone could explain why the non-greedy qualifier was cancelled out when inside a group with a * outside.. (I think it works the same with ? there as well)

I'm also a bit surprised to see K and (*SKIP).. new expressions to mess around with!

Edited by Ascend4nt
Link to comment
Share on other sites

What makes you say that the lazy part turns greedy? Let's use a simplified example:

StringRegExp("abbba acccaaddda", "(?x) ([^a]* (?: a ) [^a]*? a)*", 3)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@Ascend4nt

does not work in that way, you need the Reference & Recursion, because you never know number of quote string, look Resets capturing group numbers (?|), as always need to adapt to the situation, here's an example, the (DEFINE) not needed in this case, just use the Reference (?n), the Pattern was extracted from INF function where everything more complicated
ConsoleWrite(_GetStringCompactFormatEx('"aaa"bb"ccc"ddd"eeee";ffff') & @LF)
ConsoleWrite(_GetStringCompactFormatEx('"aaa""bb"ccc"ddd""""eeee"   ;ffff') & @LF)

ConsoleWrite(_GetSzFieldEx('     "a,aa"b\\,   b"ccc"ddd"eeee";ffff') & @LF)

Func _GetStringCompactFormatEx($sContext)
    ;;Local Static $sRefSubPattern = '(?(DEFINE)(?<qstr>' & '(?>[^\h\f\xb\x0";\r\n]*)' & '))'
    ;;Local Static $sStringCompactFormatPattern = $sRefSubPattern & '(?|"((?>[^"\r\n]|"")*)"?((?&qstr))|((?&qstr)))(?>;[^\n]*(?=[\r\n]|$))*'
    ;;Return StringReplace(StringRegExpReplace($sContext, $sStringCompactFormatPattern, "$1$2$3"), '""', '"', 0, 1)

    Local Static $sStringCompactFormatPattern = '(?|"((?>[^"\r\n]|"")*)"?((?>[^\h\f\xb\x0";\r\n]*))|((?2)))(?>;[^\n]*(?=[\r\n]|$))*'
    ;Local Static $sStringCompactFormatPattern = '[\h\f\xb\x0]*(?|"((?>[^"\r\n]|"")*)"?((?>[^\h\f\xb\x0";\r\n]+|(?>[\h\f\xb\x0]+)(?!;))*)|((?2)))(?>[\h\f\xb\x0]*;[^\n]*(?=[\r\n]|$))*'
    Return StringReplace(StringRegExpReplace($sContext, $sStringCompactFormatPattern, "$1$2"), '""', '"', 0, 1)
EndFunc   ;==>_GetStringCompactFormatEx


Func _GetSzFieldEx($sContext)
    ;;Local Static $sRefSubPattern = '(?(DEFINE)(?<qstr>' & '(?>[^\h\f\xb\x0",\\;\r\n]+|(?>[\h\f\xb\x0\\]+)(?![,;]))*' & '))'
    ;;Local Static $sStringFieldPattern = $sRefSubPattern & '[\h\f\xb\x0]*(?|"((?>[^"\r\n]|"")*)"?((?&qstr))|((?&qstr)))(?>[\h\f\xb\x0]*[,;][^\n]*)(?=[\r\n]|$))*'
    ;;Return StringReplace(StringRegExpReplace($sContext, $sStringFieldPattern, "$1$2$3"), '""', '"', 0, 1)

    Local Static $sStringFieldPattern = '[\h\f\xb\x0]*(?|"((?>[^"\r\n]|"")*)"?((?>[^\h\f\xb\x0",\\;\r\n]+|(?>[\h\f\xb\x0\\]+)(?![,;]))*)|((?2)))(?>[\h\f\xb\x0\\]*[,;][^\n]*(?=[\r\n]|$))*'
    Return StringReplace(StringRegExpReplace($sContext, $sStringFieldPattern, "$1$2"), '""', '"', 0, 1)
EndFunc   ;==>_GetSzFieldEx

;~ ConsoleWrite -------------------
;~ aaabbcccdddeeee
;~ aaa"bbcccddd""eeee
;~
;~ a,aab
;~ --------------------------------

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

Hmm.. let me see if I can reproduce this with a smaller sample.  I was using the source code as part of the source material, but it was this line that kept failing for me:

    Local $sText = "; This is a commment'' as a string and"" shouldn't be removed." ; This is a comment to explain the; string

jchd, Basically the [^g-1]*? from my post above would eat one of the quotes if there were two of the same type of quote in a row.  I verified by using StringRegExp and groups but of course I already deleted my tests.  Let me create a smaller reproducer and then try again.

DXRW4E, I tried to tailor my PCRE in a way so that it would look only for the same quote it came across, which is why it would use a group and backreference - so that it knew to only look for a ' (single) or a " (double quote), whichever was found.  It should have theoretically then moved on to search for the next of the group [;'"]..  but in my tests the expression was eating one more quote than necessary.

I will however try to read through your code and get back.
 

Link to comment
Share on other sites

See, these are the questions I like to see being asked around here once in a while. It gets the community's juices flowing.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Ok, so I've gone back and tried to figure out where my regular expression was capturing extra quotes and I can't seem to reproduce the problem.  Very odd.  I did however get a chance to try my hand at it again and figure out where the regular expression I had originally created was failing. And of course it was simple - the second capture group was bumping into the ; semicolon that starts the comments and then breaking out of the group and looking at the next part of the data.  Since the rest of the expression was ;.*, it was failing to account for spaces between any previous quotes matched.  So the new expression looks like this:

(?m)^([^'";]*)([^'";]*(['"])[^\3]*?\3)*\h*;.*$

with replacement of 12

But now there's something different here I'm seeing - capture group #2 only remembers the last iteration!  Is this normal?

For example, given this input to the expression:

#include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.

#Region ;aaaaaaa
;bbbbbbbbbbbbbbbb
#EndRegion ;ccccccccccc
 ;' "
; Example()

; New line comment.
Func Example() ; This is some comment after a function.
    Local $sText = "; This is a commme'nt as'' a string"" and shouldn't be removed." ; This is a comment to explain the; string. Woah! inception.
   Local $sTest = ";;'""'" & ';' ; END
EndFunc   ;==>Example

I wind up with this:

#include <Constants.au3>

#Region

#EndRegion
 



Func Example()
    Local $sText = " and shouldn't be removed."
   Local $sTest =  & ';'
EndFunc 

See what's missing?  Both "; This is a commme'nt as'' a string" and ";;'""'" are gone!  I thought it would accumulate those strings and combine them together, but such is not the case.  Is it because I use a backreference inside the 2nd capture group?  I'd be interested in knowing if thats the case.

Ah and btw, RegExBuddy is really a great tool.. the Debugger on there helped me see just how the regular expression engine was analyzing each step in the string.  I think I'll probably drop some money on that product, although I'm a bit concerned about the PCRE 8.xx support - it only has 'Match' capability in the demo.. (had to use Perl 5.18 to analyze the above)

P.S. Anyone interested in the demo for that product, check out the first RegExBuddy link on this page: http://www.rexegg.com/regex-tools.html#rb.  For some reason, the main site doesn't appear to have a link to it.

Edited by Ascend4nt
Link to comment
Share on other sites

 

capture group #2 only remembers the last iteration!  Is this normal?

Maybe using StringRegExp with option 4 will shed the light on what the function returns.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Maybe using StringRegExp with option 4 will shed the light on what the function returns.

Hmm, nope I'm still left wondering where parts of the data goes.  I'm gonna need to ask others to take a look at it.  Do you have suggestions where a good place to go to ask these questions? My first thought is StackOverflow but there might be some PCRE community out there..

Link to comment
Share on other sites

The short answer is that you only get the last capture when the pattern looks like (whatever)*

There is only one capture slot allocated for the return and it is overriden successively by multiple sub-matches.

The problem is with escaped quotes inside a string. A string without escaped (doubled) quotes (let's call that a partial string) is defined by

( ['"] ) [^\g-1]*? \g-1

but if you enclose this pattern by ()* you only retain the last capture.

Instead a generic AutoIt string is

( (?: ( ['"] ) [^\g-1]*? \g-1 )* )

Note that we don't fear "abc" being contiguous to 'def' as string quotes need to be consistent.

So the actual pattern could be:

(?mx)  ^  ( [^'";]* )   ( [^'";]*  ( (?: ( ['"] ) [^\g-1]*? \g-1 )* ) )  \s* ; .*  $

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

 

The short answer is that you only get the last capture when the pattern looks like (whatever)*

There is only one capture slot allocated for the return and it is overriden successively by multiple sub-matches.

The problem is with escaped quotes inside a string. A string without escaped (doubled) quotes (let's call that a partial string) is defined by

( ['"] ) [^\g-1]*? \g-1

but if you enclose this pattern by ()* you only retain the last capture.

Instead a generic AutoIt string is

( (?: ( ['"] ) [^\g-1]*? \g-1 )* )

Note that we don't fear "abc" being contiguous to 'def' as string quotes need to be consistent.

So the actual pattern could be:

(?mx)  ^  ( [^'";]* )   ( [^'";]*  ( (?: ( ['"] ) [^\g-1]*? \g-1 )* ) )  \s* ; .*  $

 

So many parentheses.. my brainn >_<

Okay, so I think I understand what you're doing with the non-capture group.. I'm not entirely sure why it empties the capture group each iteration, but I'll try to wrap my brain around it.  In the meantime I'll accept that I should use a non-capture group in certain situations..

I had worked out the idea that using the whole pattern I would still be able to catch adjacent quotes since once the matching quote is reached, it would restart the pattern looking for the next non-quote and would immediately skip to the next matching quote pattern.  I suppose that would look like this for my original code (thx for the ?x tip for spacing these out):

(?mx)^ ([^'";]*) ((?: ([^'";]*  (['"]) [^\g-1]*? \g-1 )* )) \h* ;.* $

That does in fact work, but it does do extra work in checking for adjacent quotes.  Using your contiguous-quotes check would be a better optimization, so that seems more appropriate.  By the way, this seems to work as well (one less pair of parentheses):

(?mx)^ ([^'";]*) ([^'";]* (?: (['"]) [^\g-1]*? \g-1 )* ) \h* ;.* $

That's a very nice solution and I applaud you jchd for knowing all these quirks in and out!

Oh - one more thing - seems the latest AutoIt version (3.3.10.2) doesn't like relative backreferences like "g-1", and reports and error. Version 3.3.8.1 works fine with that code though!

Thanks again!  (oh and kudos on the StringRegExp documentation - job very well done)

Link to comment
Share on other sites

To see why only the final group is captured, run this much simpler example:

#include <Array.au3>

Local $s = "aaaA"
Local $a = StringRegExp($s, "(?i)(a)*", 1)
_ArrayDisplay($a)

PCRE sees one capturing group, hence it reserves room for one output string. The AutoIt wrapper merely puts that in an array. More complex examples only bring mud: that is the actual reason.

You must have another problem in some pattern, as v3.3.10.2 correctly handles g-n

Thanks for your appreciation. It was dificult to organize things in the most logical manner and even more "fun" to find the correct level of detail for explanations. Time will tell where people have difficulties.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

To see why only the final group is captured, run this much simpler example:

#include <Array.au3>

Local $s = "aaaA"
Local $a = StringRegExp($s, "(?i)(a)*", 1)
_ArrayDisplay($a)

PCRE sees one capturing group, hence it reserves room for one output string. The AutoIt wrapper merely puts that in an array. More complex examples only bring mud: that is the actual reason.

Thanks, that's pretty succinct and to the point.

You must have another problem in some pattern, as v3.3.10.2 correctly handles g-n

Is it only me?  It's odd because I've now tested this on 3.3.10.0 as well and I'm getting the same error, in both 32 and 64-bit modes (running on Win7 x64).  Here's the code:

$sTestStr = '#include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.' & @CRLF & _
    @CRLF & _
    @CRLF & _
    '#Region ;aaaaaaa' & @CRLF & _
    ';bbbbbbbbbbbbbbbb' & @CRLF & _
    '#EndRegion ;ccccccccccc' & @CRLF & _
    ' ;'' "' & @CRLF & _
    '; Example()' & @CRLF & _
    @CRLF & _
    '; New line comment.' & @CRLF & _
    'Func Example() ; This is some comment after a function.' & @CRLF & _
    'Local $sText = "; This is a commme''nt as'' a string"" and shouldn''t be removed." ; This is a comment to explain the; string. Woah! inception.' & @CRLF & _
    '  Local $sTest = ";;''""''" & '';'' ; END' & @CRLF & _
    'EndFunc   ;==>Example'

;~ $sPattern = "(?mx)^ ([^'"";]*) ([^'"";]* (?: (['""]) [^\g-1]*? \g-1 )* ) \h* ;.* $"

; Corrected pattern (see post #40):
$sPattern = "(?mx)^ ([^'"";]*) ( (?:( (['""]) .*? \g-1 ) [^'"";]* )* ) ;.* $"

$sResult = StringRegExpReplace($sTestStr, $sPattern, "\1\2")

If @error Then
    Local $iErr = @error, $iExt = @extended
    ConsoleWrite("@error = " & $iErr & ", @extended = " & $iExt & @CRLF)
    MsgBox(0, "StringRegExpReplace error", "@error = " & $iErr & ", @extended = " & $iExt & @CRLF & "Pattern: " & $sPattern)
Else
    MsgBox(0, "Results",  $sResult)
EndIf

*edit: $sPattern was faulty. fixed in code (see post #40 for explanation)

Edited by Ascend4nt
Link to comment
Share on other sites

@Ascend4nt

I do not understand what you do them, there is nothing that you can not do with regexp, said this because seems you're doing things more complicated than they are, in the end all are much easy them, the working example the right one to use, is already here

Local $sTestStr = '#include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.' & @CRLF & _
    @CRLF & _
    @CRLF & _
    '#Region ;aaaaaaa' & @CRLF & _
    ';bbbbbbbbbbbbbbbb' & @CRLF & _
    '#EndRegion ;ccccccccccc' & @CRLF & _
    ' ;'' "' & @CRLF & _
    '; Example()' & @CRLF & _
    @CRLF & _
    '; New line comment.' & @CRLF & _
    'Func Example() ; This is some comment after a function.' & @CRLF & _
    'Local $sText = "; This is a commme''nt as'' a string"" and shouldn''t be removed." ; This is a comment to explain the; string. Woah! inception.' & @CRLF & _
    '  Local $sTest = ";;''""''" & '';'' ; END' & @CRLF & _
    'EndFunc   ;==>Example'

ConsoleWrite(_StripStringComments($sTestStr) & @LF)
ConsoleWrite(_StripStringCommentsEx($sTestStr) & @LF)

Func _StripStringComments($sContext)
    Local Static $sStripStringCommentsPattern = '(?|("(?>[^"\r\n]|"")*(?>"?))((?>[^\h\f\xb\x0"'';\r\n]*))|(''(?>[^''\r\n]|'''')*(?>''?))((?2))|((?2)))(?>;[^\n]*(?=[\r\n]|$))*'
    Return StringRegExpReplace($sContext, $sStripStringCommentsPattern, "$1$2")
EndFunc   ;==>_StripStringComments

;; or that in the first post (that you should use), the solution chosen by guinness
Func _StripStringCommentsEx($sContext)
    Local Static $sStripStringCommentsPattern = '\n[^;"''\r\n]*(?:[^;"''\r\n]|''[^''\r\n]*''|"[^"\r\n]*")*\K;[^\r\n]*'
        Return StringTrimLeft(StringRegExpReplace(@LF & $sContext, $sStripStringCommentsPattern, ""), 1)
EndFunc   ;==>_StripStringCommentsEx

;$sTestStr Before
#cs    
 #include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.


#Region ;aaaaaaa
;bbbbbbbbbbbbbbbb
#EndRegion ;ccccccccccc
 ;' "
; Example()

; New line comment.
Func Example() ; This is some comment after a function.
Local $sText = "; This is a commme'nt as' a string"" and shouldn't be removed." ; This is a comment to explain the; string. Woah! inception.
  Local $sTest = ";;'""'" & ';' ; END
EndFunc   ;==>Example
#CE


;Return _StripStringComments($sTestStr)
#CS
 >Running:(3.3.10.2):C:\Program Files (x86)\AutoIt3\autoit3.exe "C:\Users\DXRW4E\Desktop\StripStringComments.au3"
--> Press Ctrl+Alt+F5 to Restart or Ctrl+Break to Stop
#include <Constants.au3>


#Region

#EndRegion




Func Example()
Local $sText = "; This is a commme'nt as' a string"" and shouldn't be removed."
  Local $sTest = ";;'""'" & ';'
EndFunc
+>08:16:12 AutoIt3.exe ended.rc:0
>Exit code: 0    Time: 0.320
#CE


;Return _StripStringCommentsEx($sTestStr)
#CS
>Running:(3.3.10.2):C:\Program Files (x86)\AutoIt3\autoit3.exe "C:\Users\DXRW4E\Desktop\test.au3"    
--> Press Ctrl+Alt+F5 to Restart or Ctrl+Break to Stop
#include <Constants.au3>


#Region

#EndRegion
 



Func Example()
Local $sText = "; This is a commme'nt as' a string"" and shouldn't be removed."
  Local $sTest = ";;'""'" & ';'
EndFunc   
+>08:33:51 AutoIt3.exe ended.rc:0
>Exit code: 0    Time: 0.320
#CE

so what's wrong with them there ??, is not the result you want ??

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

Ascend4nt,

You're correct, in that g-n is now flagged as error by PCRE 8.33, but worked sometime before.

DXRW4E,

Ascend4nt merely whishes to understand specific details.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Hi jchd, yes I understand that (all post in this topic are to give an example, an idea, and not exact solutions), but as always you can use by force the same way for all situations, need to adapt to the situation on what you want to do, all the features of the regex are all useful, the 'g-n" is really nice, but not for all situations, II mean is not need to use it by force ect ect, the beauty of the regex is just that the possibility to choose

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

Ascend4nt,

Also note that

[^\g-1]*?

can be simply replaced by .*? which works with all versions! Also the first group isn't necessary:

#include <Array.au3>

$sTestStr = '#include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.' & @CRLF & _
    @CRLF & _
    @CRLF & _
    '#Region ;aaaaaaa' & @CRLF & _
    ';bbbbbbbbbbbbbbbb' & @CRLF & _
    '#EndRegion ;ccccccccccc' & @CRLF & _
    ' ;'' "' & @CRLF & _
    '; Example()' & @CRLF & _
    @CRLF & _
    '; New line comment.' & @CRLF & _
    'Func Example() ; This is some comment after a function.' & @CRLF & _
    'Local $sText = "; This is a commme''nt as'' a string"" and shouldn''t be removed." ; This is a comment to explain the; string. Woah! inception.' & @CRLF & _
    '  Local $sTest = ";;''""''" & '';'' ; END' & @CRLF & _
    'EndFunc   ;==>Example'

$sPattern = "(?mx)^ ([^'"";]* (?: (['""]) .*? \g-1 )* ) \h* ;.* $"
$sResult = StringRegExpReplace($sTestStr, $sPattern, "\1\2")
MsgBox(0, "Results",  $sResult)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@Ascend4nt

I do not understand what you do them, there is nothing that you can not do with regexp, said this because seems you're doing things more complicated than they are, in the end all are much easy them, the working example the right one to use, is already here..

DXRW4E, I don't understand how what I'm doing is so complicated? I was trying to understand why my solution didn't work as I had expected.  And btw, speaking of complicated, I'd propose that the alternative you posted fits that description:

Func _StripStringComments($sContext)
    Local Static $sStripStringCommentsPattern = '(?|("(?>[^"\r\n]|"")*(?>"?))((?>[^\h\f\xb\x0"'';\r\n]*))|(''(?>[^''\r\n]|'''')*(?>''?))((?2))|((?2)))(?>;[^\n]*(?=[\r\n]|$))*'
    Return StringRegExpReplace($sContext, $sStripStringCommentsPattern, "$1$2")
EndFunc   ;==>_StripStringComments

That's very difficult to read to me and requires a bit of thinking to figure out its logic.  I'm not saying its bad or wrong, just a bit complicated looking.

..the 'g-n" is really nice, but not for all situations, II mean is not need to use it by force ect ect, the beauty of the regex is just that the possibility to choose

This isn't so much about forcing a particular style to fit something it doesn't.  In fact, I believe it fits nicely - capturing and acting on a specific quote by using a group and backreference avoids spelling out each possible match using separators (|) which leads to a lengthier PCRE.  But as you said - the beauty of PCRE, and even programming in general, is that you can approach a problem from many different angles.

Ascend4nt,

Also note that

[^\g-1]*?

can be simply replaced by .*? which works with all versions!

Ah, I hadn't considered using .*? there.  Good catch!  Although I still hope future versions of PCRE add the ability to search for anything not-of-a-capture-group back..

Also the first group isn't necessary:

Actually, I just realized earlier that your contiguous-quotes optimization was missing what my original PCRE was trying to accomplish.  Basically, we need to look at 3 cases:

  1. Lines with strings containing doubled-up quotes as in:

    Local $sStr = "Text with ""quoted string"" inside" ; comment

  2. LInes with multiple strings which may or may not contain doubled-up quotes (this is where your PCRE fails):

    Local $sStr1 = "String1", $sStr2 = "String2";Another comment

  3. Lines with data after the quotes but before the comments:

    Local $sStr = "string", $fVal = 4.1;More comments

I just realized now that I should have put the second group of [^'";]* after the quotes capture so that I can match case 3. above (where I was assuming there would just be whitespace after the second capture group). Which leads to this PCRE which seems to work correctly now:

(?mx)^ ([^'";]*) ( (?:( (['"]) .*? \g-1 ) [^'";]*)* ) (;.*) $

So to give a full example of this

$sTestStr = '#include <Constants.au3> ; Should be MsgBoxConstants for all those rocking v3.3.10.0.' & @CRLF & _
    @CRLF & _
    @CRLF & _
    '#Region ;aaaaaaa' & @CRLF & _
    ';bbbbbbbbbbbbbbbb' & @CRLF & _
    '#EndRegion ;ccccccccccc' & @CRLF & _
    ' ;'' "' & @CRLF & _
    '; Example()' & @CRLF & _
    @CRLF & _
    '; New line comment.' & @CRLF & _
    'Func Example() ; This is some comment after a function.' & @CRLF & _
    'Local $sText = "; This is a commme''nt as'' a string"" and shouldn''t be removed." ; This is a comment to explain the; string. Woah! inception.' & @CRLF & _
    '  Local $sTest = ";;''""''" & '';'' ; END' & @CRLF & _
    ' Local $sVar1 = ''a"b"'', $sVar2 = "c''d''e" ; Comment' & @CRLF & _
    ' Local $sTwoLine = "abc" & "def" & ''ghi'' & _ ; Comment' & @CRLF & _
    '  "jkl"' & @CRLF & _
    'EndFunc   ;==>Example'

$sPattern = "(?mx)^ ([^'"";]*) ( (?:( (['""]) .*? \g-1 ) [^'"";]* )* ) ;.* $"

$sResult = StringRegExpReplace($sTestStr, $sPattern, "\1\2")

If @error Then
    Local $iErr = @error, $iExt = @extended
    ConsoleWrite("@error = " & $iErr & ", @extended = " & $iExt & @CRLF)
    MsgBox(0, "StringRegExpReplace error", "@error = " & $iErr & ", @extended = " & $iExt & @CRLF & "Pattern: " & $sPattern)
Else
    MsgBox(0, "Results",  $sResult)
EndIf

There. I think thats enough work on that pattern! :P

Edited by Ascend4nt
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...