Sign in to follow this  
Followers 0
water

Regular Expression to escape characters

14 posts in this topic

#1 ·  Posted (edited)

Hi RegExp gurus!

I've never used regular expressions before but now I need a - in my opinion - rather complex regexp.

For my Active Directory UDF I have to escape the characters "/", "#" and "," with a backslash.

But it should not escape already escaped characters.

So: "/" should become "\/" and "\/" should stay "\/"and not become "\\/".

Is this possible with StringRegExpReplace?

Thanks in advance!

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites



You can use \Q and \E to disable/enable pattern metacharacter.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hi RegExp gurus!

I've never used regular expressions before but now I need a - in my opinion - rather complex regexp.

For my Active Directory UDF I have to escape the characters "/", "#" and "," with a backslash.

But it should not escape already escaped characters.

So: "/" should become "\/" and "\/" should stay "\/"and not become "\\/".

Is this possible with StringRegExpReplace?

Thanks in advance!

It's really hard, even in Perl. I suspect that you might have to iterate through the string one character at a time replacing each occurrence as you find it. I'm still discussing the problem on a Perl web site so I might come back with a different answer soon.

In Perl, you would do this with a look-ahead assertion (?=[\/#,]) instead of a plain old capture of a set of characters ([\/#,]) but AutoIt appears not to support look-ahead assertions so I think you're out of luck with the regular expression route unless someone knows better.

*Edit:* See below for working suggestions!

Edited by PhilHibbs

Share this post


Link to post
Share on other sites

You can use a lookbehind assertion like this:

$s = "/slash/, esc-slash\/, text#inside, esc\#untouched and esc\, is fine"
Local $t = StringRegExpReplace($s, "(?<!\\)([/#,])", "\\$1")
ConsoleWrite($s & @LF & $t & @LF)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

this is a perl regular expression doing what you want, just test if the preceding character is not a backslash.

$str =~ s{([^\\]|^)(#|/|,)}{$1\\$2}g;

I tried converting it to autoit3, it should look like this

StringRegExpReplace($str,"([^\\]|\A)(#|/|,)","$0\\$1",0)

Hope it works, but I did not test it

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

Mmm, Am I the only one that thinks that having strings that can contain both escaped and unescaped special-character is actually a bad thing. :(

Good point

Share this post


Link to post
Share on other sites

this is a perl regular expression doing what you want, just test if the preceding character is not a backslash.

$str =~ s{([^\\]|^)(#|/|,)}{$1\\$2}g;

That's almost identical to my first attempt! It doesn't handle two special characters in a row, which is problem I could not solve without the use of a look-ahead assertion, but a look-behind might work. If the OP doesn't need to address multiple special characters in a row, then your suggestion might be acceptable. Maybe I'm just being fussy about cases that might never occur in the real world. When is good enough good enough?

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

You can use a lookbehind assertion like this:

$s = "/slash/, esc-slash\/, text#inside, esc\#untouched and esc\, is fine"
Local $t = StringRegExpReplace($s, "(?<!\\)([/#,])", "\\$1")
ConsoleWrite($s & @LF & $t & @LF)

Wow, that works. So does the look-ahead assertion, looks like AutoIt supports more than the help file is letting on. This is the look-ahead version:

$s = "/slash/, esc-slash\/, text#inside, esc\#untouched and esc\, is fine"
$t = StringRegExpReplace($s, "(^|[^\\])(?=[\/#,])", "$1\\")
ConsoleWrite($s & @LF & $t & @LF)

Both of these work. Take your pick. I think the look-behind one is better, even though it isn't mine. :(

Edited by PhilHibbs

Share this post


Link to post
Share on other sites

Of course look-behind or -ahead work and they are documented in full in the link given in the StringRegExp help page.

There is a significant difference in the two look-* versions: the look-behind will rush to first find and capture a match to [/#,] then only when one is found, check whether the behind condition holds.

In contrast the look-ahead will match to the first part and capture it (^|[^\\]) [and that occurs at start of string and at every non backslash character] then only check if next character is in [~#,]. Obviously, this approach needs much more work since we can anticipate that backslashes in incoming text are essentially rare. The difference for small chunks of data is negligible, but I bet that if you routinely have large texts and a non-trivial pattern, it will become perceptible.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Of course look-behind or -ahead work and they are documented in full in the link given in the StringRegExp help page.

Ah, I didn't spot "Complete description can be found here". I don't think "Of course X works" is ever an appropriate thought in any context involving computers though - I always start from the philosophical standpoint that by default nothing works unless someone has gone to some effort to make it work. There are incomplete regex implementations out there, SciTE for example doesn't appear to support look-ahead or look-behind assertions in its search box.

There is a significant difference in the two look-* versions: the look-behind will rush to first find and capture a match to [/#,] then only when one is found, check whether the behind condition holds.

In contrast the look-ahead will match to the first part and capture it (^|[^\\]) [and that occurs at start of string and at every non backslash character] then only check if next character is in [~#,]. Obviously, this approach needs much more work since we can anticipate that backslashes in incoming text are essentially rare. The difference for small chunks of data is negligible, but I bet that if you routinely have large texts and a non-trivial pattern, it will become perceptible.

That makes sense. I had intuitively come to the conclusion that look-behind was the "right" thing to do, but I hadn't thought it through. I got the look-ahead solution on PerlMonks chat, I've never done look-assertions before in regexen.

Share this post


Link to post
Share on other sites

I believe that PCRE direct implementations --like the one in AutoIt-- has fairly good chances to conform to the PCRE library. AFAIK only Perl callouts are left unimplemented in AutoIt, but it's quite possible that a similar feature could someday emerge in AutoIt, according to recent discussions. This would leverage the power of our PCRE regexps and at the same time cause a number of interesting help questions :(

I don't know which flavor Scite offers: I don't remember using it.

I always start from the philosophical standpoint that by default nothing works unless someone has gone to some effort to make it work.

The bold part sounds very pessimistic to me :)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

The bold part sounds very pessimistic to me :(

I like pleasant surprises, and my philosophy frequently provides them whilst simultaneously guarding against complacency.

Share this post


Link to post
Share on other sites

A big "Thank you!" for the fast replies!

That solves my problem and improves the quality of my UDF!


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0