Sign in to follow this  
Followers 0
thomasl

Regular expressions -- another approach

6 posts in this topic

#1 ·  Posted (edited)

Regularly rattling off regular expressions? So do I. But I finally grew tired of the way AutoIt3 does them. This is a great feature, potentially, but in its current incarnation it's rather slow and way too buggy (especially backreferences: more often than not they'd simply not work).

So, after discovering yet another bug, I realised that using the Perl regular expression engine would have significant advantages. Perl REs are:

  • fully documented
  • more powerful than AU3's
  • faster for large strings[1] (I am not talking about a factor of two or four: often it's more in the 20 to 40 region; for very large strings ("big-ass strings") I've even see factors > 200)
  • reliable (in fact, saying that Perl regular expressions are merely "reliable" is an understatement: the Perl interpreter core is among the most thoroughly tested pieces of software in the world)
The killer disadvantage (and, sadly, a show-stopper for many people) is that using this engine does require the core Perl DLL. If you have that fine piece of engineering lying around (or are willing to copy it into your AutoIt3 directory or somewhere in your path), then this could be just the ticket. All the more as you need no knowledge of Perl at all: you just call the supplied AU3 functions. They and Perl will do all the hard work.

<EDIT>

The following three files are included in download 1:

Perlrx.au3: include file with all the necessary AU3 functions and wrappers

rxMatch.au3: example file for matching stuff

rxSubst.au3: example file for replacing stuff

Download 2 also includes the DLL needed to embed Perl, plus its source code:

callperl.au3

Perl.au3

perlembed.c

perlembed.def

perlembed.dll

readme

(If you also need the Perl DLL itself, see http://www.autoitscript.com/forum/index.php?showtopic=32592 for the full package.)

</EDIT>

As I state in Perlrx.au3, this is a preliminary public release. I realise that it's a pretty specialist, technical thing: I did this mainly for my own purposes. Then again, it might help others who are struggling with the AU3 RE functions. (And the code doubles as a practical example of how easy it is to interface AU3 (strong on GUI) and Perl (strong on text processing).) If required I'd be willing to put some more work into this, though the current version seems to be quite stable and does all I need.

The Perl RE syntax is a widely recognised standard (many well-known apps, for instance Apache or PHP, use the same syntax); luckily it's almost identical to the AU3 style. The biggest difference has to do with referencing groups in replace strings: AU3 uses the \1 syntax, whereas Perl uses $1. (To avoid misunderstandings: in the search expression both AU3 and Perl use the \1 syntax to backreference groups.)

Here is a web page with the technical details for those not conversant with regular expressions, Perl style:

http://perldoc.perl.org/perlre.html

And two more pages, a quick introduction and a tutorial:

http://perldoc.perl.org/perlrequick.html

http://perldoc.perl.org/perlretut.html

I may at some later point write a short guide with a more complete list of differences between the two.

--

[1] Here is another amazing feature of AU3 REs: in trivial cases you would think that the time required for a match or replace is roughly proportional to the length of the string to be searched (for all RE implementations I knew before AU3 this is true).

Well, not for AU3. In some of my (rather unscientific) tests the time required grew quadratically, in one (pathological?) case even between erratically and exponentially:

File            AU3 requires       Perl requires
size (in kb)    (msec)           (msec)

 ~20               186             18
 ~40              2134             32
 ~80             12676             70
~160            59988             157
~320            process killed   321
                after ~15 min
                (perhaps crashed?)

However, for short strings, the AU3 engine is definitely faster: the overhead of calling the Perl DLL seems to be in the region of a couple of msec on my machine.

Edited by thomasl

Share this post


Link to post
Share on other sites



This is very interesting - thanks thomasl! I'll give you some feedback when i get a chance to look at it properly.

Meantime, i've been (slowly) pursuing yet another approach: FreeBASIC can be compiled to EXE or DLL, and comes with some examples using supplied Perl RE libbraries.

In time, i'm hoping to come up with a FreeBASIC DLL designed for calling from AU3 .. either that or a commandline utility like minitrue (which worked splendidly up to WinNT, but not so well after that).

But your Perl-engine option is very exciting too!

Thanks

:)

Share this post


Link to post
Share on other sites

HI,

I like the idea and will definately test it.

Thanks for sharing!

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

HI @tomasl,

Good work, keep it coming. :)

I have noticed that you are active in the discussion about implementing PCRE in AutoIt to.

Only one note. Could you include your perlembed.dll (and source?) in the perlrx.zip file? It would make it so much easier to get all the bits and pieces. I guess anyone figuring they need perl RE would also have perl (perl58.dll) lying around?

Share this post


Link to post
Share on other sites

And a side note while I'm at it.

In rxSubst.au3 you claim:

; ditto, though Perl (and most other RE engines) use $x instead of \x

; (which is used in matching)

Although many people use perl these days I think you will find that vim, (vi, emacs ?) and sed users are accustomed to the "\x" style of backreferencing. (OK, I'm a vim user so I'm biased :) )

Share this post


Link to post
Share on other sites

Could you include your perlembed.dll (and source?) in the perlrx.zip file?

Yes, I can do that (though probably not before tomorrow). However, I won't include the Perl DLL as such; for this, people would still have to download the full package.

Although many people use perl these days I think you will find that vim, (vi, emacs ?) and sed users are accustomed to the "\x" style of backreferencing. (OK, I'm a vim user so I'm biased :) )

Well, I am so used to the Perl $1 style (also used in Apache's mod_rewrite, for instance) that I tend to think this is a universal. Of course, the style, whatever it is, is not more than syntactic sugar.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0