Jump to content

Regular Expression Testing


Nutster
 Share

Recommended Posts

  • Replies 138
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Without grouping, wouldn't that find "janeb" and "jafeb" .. ?  :)

<{POST_SNAPBACK}>

From what I have read in the regexp page that was posted earlier was that | tries to match the largest groups, not just single characters. I will try to do it without the brackets and if I can not, then the brackets will be needed to enforce the locality.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Aaaah!  :) .. so to find "jafeb" and "janeb", it would be something like "ja(n|f)eb" .. ?

Hmm - that makes sense too. Anyway, as you say, later for that  :)

<{POST_SNAPBACK}>

I would use "ja[fn]eb", which is working! ;) Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I would use "ja[fn]eb", which is working!   :)

<{POST_SNAPBACK}>

It's still different from "jan|feb".

"jan|feb" will match:

jan

feb

xjanuary

xfebruary

but "ja[nf]eb" will not match the above examples.

As an alternative method to implementing the OR that way, you could always inverse the code you use for Exclusion sets ("[^a-zA-Z]") to make an INclusion set. Just a thought. I know I have seen it implenented like that before somewhere.

Yes, it's sometimes implemented as this, but that's a bug in the expression.

"[jf][ae][nb]" will match:

jan <- that's wanted

feb <- that's wanted

jeb <- that's *NOT* wanted

jab <- that's *NOT* wanted

fan <- that's *NOT* wanted

and there's not really a way to exclude the last third matches without using the pipe.

Edited by sugi
Link to comment
Share on other sites

I have added \x for hexidecimal digits (tested Ok), and {x,y} for specific range of repeats. It actually simplified the code from what I had. * is defined at {0,}, + is {1,} and ? is {0,1}.

Testing continues tomorrow after work.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

  • 2 weeks later...
Finally! The testing is complete and I have submitted to Jon.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I haven't used regular expression much in my life.

But I will use it if I understand the syntax.

Anyway. In Crimson Editor I used reg exp to replace a string.

Crimson Editor has an operator \0.

Which stands for "everything matched".

I was wondering if you added that.

On second thought... I'm not sure if it makes sense to add it.

Link to comment
Share on other sites

I haven't used regular expression much in my life.

But I will use it if I understand the syntax.

Anyway. In Crimson Editor I used reg exp to replace a string.

Crimson Editor has an operator \0.

Which stands for "everything matched".

I was wondering if you added that.

On second thought... I'm not sure if it makes sense to add it.

<{POST_SNAPBACK}>

You can store groups in an array and I have also added \# which stores the current cursor position. I have some plans as what goes in next, and that can be added to the RegExp TODO list.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

  • Administrators

After adding the code I must admit I thought it was going to be more like php. (my only real exposure to it). All the php code I've seen seems to rely completely on a couple of functions:

preg_match() (Which seems to do what RegExp() does - just match)

and

preg_replace() which seems to be used in nearly every line of php like this:

// Ensure that spacing is preserved

$txt = preg_replace("/\t/", "&nbsp;&nbsp;&nbsp;&nbsp;", $txt);

$txt = preg_replace( "#\s{2}#", " &nbsp;", $txt );

Is this possible? Seems very useful.

I wasn't keen on the Set/Close functions either - is there really enough of a performance hit to require these? If it's not a massive diff then I'll probably remove.

Link to comment
Share on other sites

After adding the code I must admit I thought it was going to be more like php.  (my only real exposure to it).  All the php code I've seen seems to rely completely on a couple of functions:

preg_match() (Which seems to do what RegExp() does - just match)

and

preg_replace() which seems to be used in nearly every line of php like this:

// Ensure that spacing is preserved

$txt = preg_replace("/\t/", "&nbsp;&nbsp;&nbsp;&nbsp;", $txt);

$txt = preg_replace( "#\s{2}#", " &nbsp;", $txt );

Is this possible?  Seems very useful.

I wasn't keen on the Set/Close functions either - is there really enough of a performance hit to require these?  If it's not a massive diff then I'll probably remove.

<{POST_SNAPBACK}>

RegExpReplace() goes on the RegExp TO DO list. As far as the set and close functions, I will benchmark them this weekend. I could adjust it to cache the last few regular expressions, instead.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Thanks for the shiny new toys, Nutster & Jon :)

I downloaded the RegExp version on Sunday, and here's some initial feedback..

  • The help file says that the the 3rd parameter in RegExp(), which identifies a variable that will receive hits, will be created in the same scope as DIM if it does not exist already: I couldn't get it to create such a variable if it didn't already exist, and always had to define it up-front.
  • Suggestion: could the array of hits that is returned by RegExp() please have an element with index=0, which indicates the highest index (like StringSplit() does)? Makes it easier to process the array.
  • The RegExp topic in the helpfile has hyperlinks that don't work: to RegExpSet and DIM topics. The only ones that do work are those under the "Related" paragraph.
  • There is no tab expression ("\t")..?
  • The Example for RegExpClose() is wrong.
HTH :)
Link to comment
Share on other sites

As far as the set and close functions, I will benchmark them this weekend.  I could adjust it to cache the last few regular expressions, instead.

<{POST_SNAPBACK}>

I have done the benchmarking. The savings are < 5% for time. I would say then to scrap the RegExpSet and RegExpClose functions and have F_RegExp cache the last so many in that list instead.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Thanks for the shiny new toys, Nutster & Jon  :)

I downloaded the RegExp version on Sunday, and here's some initial feedback..

  • The help file says that the the 3rd parameter in RegExp(), which identifies a variable that will receive hits, will be created in the same scope as DIM if it does not exist already: I couldn't get it to create such a variable if it didn't already exist, and always had to define it up-front.

  • Suggestion: could the array of hits that is returned by RegExp() please have an element with index=0, which indicates the highest index (like StringSplit() does)? Makes it easier to process the array.

  • The RegExp topic in the helpfile has hyperlinks that don't work: to RegExpSet and DIM topics. The only ones that do work are those under the "Related" paragraph.

  • There is no tab expression ("\t")..?

  • The Example for RegExpClose() is wrong.
HTH  :)

<{POST_SNAPBACK}>

Um let's see.
  • I will check. That's what I get for always turning on "MustDeclareVars". ;)
  • That is what UBound is for. I personally dislike that feature of StringSplit.
  • RegExpSet is coming out. When I rewrite the docs, I will try to build the correct links to the Keywords.
  • Didn't think of tab. Should be easy enough to add.
  • RegExpClose is coming out.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I have begin testing regexp yesterday, and I could say this is a very nice thing! But I'm not regexp guru, so one trouble here. It's should be possible to explicitly set "." character. I suppose that should be "\.". But all my tryings are fail :) There is code (i'm trying to realize simple file mask equivalent):

$line = "C:\Documents and Settings\User\NTUSER.DAT"

If RegExp($line, '[.]*\.DAT$') Then            ; *.dat
    Msgbox(0, "RegExp", "Pattern found")
Else
    Msgbox(0, "RegExp", "Pattern NOT found")
Endif

But this example is working:

$line = "C:\Documents and Settings\User\NTUSER\DAT"

If RegExp($line, '[.]*\\DAT$') Then            ; *.dat
    Msgbox(0, "RegExp", "Pattern found")
Else
    Msgbox(0, "RegExp", "Pattern NOT found")
Endif

So escape of "." is not working?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...