Jump to content



Photo

Regular Expression Testing


  • Please log in to reply
138 replies to this topic

#21 Matt @ MPCS

Matt @ MPCS

    Just another AutoIt user trying to help out! :)

  • Active Members
  • PipPipPipPipPipPip
  • 700 posts

Posted 03 November 2004 - 04:45 PM

Alright, it was a long shot. Thanks for all your work Nutster!

*** Matt @ MPCS







#22 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 03 November 2004 - 05:52 PM

[..]
Should not need the brackets.

$sTarget = "jan|feb"   ; finds "jan" or "feb"

<{POST_SNAPBACK}>

Without grouping, wouldn't that find "janeb" and "jafeb" .. ? :)

#23 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 03 November 2004 - 06:16 PM

Without grouping, wouldn't that find "janeb" and "jafeb" .. ?  :)

<{POST_SNAPBACK}>

From what I have read in the regexp page that was posted earlier was that | tries to match the largest groups, not just single characters. I will try to do it without the brackets and if I can not, then the brackets will be needed to enforce the locality.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#24 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 03 November 2004 - 06:22 PM

Aaaah! :) .. so to find "jafeb" and "janeb", it would be something like "ja(n|f)eb" .. ?

Hmm - that makes sense too. Anyway, as you say, later for that :)

#25 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 03 November 2004 - 07:50 PM

Aaaah!  :) .. so to find "jafeb" and "janeb", it would be something like "ja(n|f)eb" .. ?

Hmm - that makes sense too. Anyway, as you say, later for that  :)

<{POST_SNAPBACK}>

I would use "ja[fn]eb", which is working! ;)

Edited by Nutster, 03 November 2004 - 07:52 PM.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#26 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 04 November 2004 - 06:08 AM

:)
Brilliant!
:)

#27 sugi

sugi

    Universalist

  • Active Members
  • PipPipPipPipPipPip
  • 441 posts

Posted 04 November 2004 - 10:39 AM

I would use "ja[fn]eb", which is working!   :)

<{POST_SNAPBACK}>

It's still different from "jan|feb".
"jan|feb" will match:
jan
feb
xjanuary
xfebruary
but "ja[nf]eb" will not match the above examples.

As an alternative method to implementing the OR that way, you could always inverse the code you use for Exclusion sets ("[^a-zA-Z]") to make an INclusion set. Just a thought. I know I have seen it implenented like that before somewhere.

Yes, it's sometimes implemented as this, but that's a bug in the expression.
"[jf][ae][nb]" will match:
jan <- that's wanted
feb <- that's wanted
jeb <- that's *NOT* wanted
jab <- that's *NOT* wanted
fan <- that's *NOT* wanted
and there's not really a way to exclude the last third matches without using the pipe.

Edited by sugi, 04 November 2004 - 10:43 AM.


#28 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 09 November 2004 - 07:01 PM

I have added \x for hexidecimal digits (tested Ok), and {x,y} for specific range of repeats. It actually simplified the code from what I had. * is defined at {0,}, + is {1,} and ? is {0,1}.

Testing continues tomorrow after work.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#29 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 10 November 2004 - 08:56 AM

Sounds good :) .. shout if you want any help testing pre-release versions.

#30 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 24 November 2004 - 03:00 PM

Finally! The testing is complete and I have submitted to Jon.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#31 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 24 November 2004 - 03:15 PM

:) .. Magic! Can't wait! :)

#32 Josbe

Josbe

    Infrequent ghost ☺

  • Active Members
  • PipPipPipPipPipPip
  • 1,585 posts

Posted 24 November 2004 - 04:37 PM

:)  .. Magic! Can't wait!  :)

<{POST_SNAPBACK}>

Me too! ;)

#33 SlimShady

SlimShady

    AutoIt lover

  • Active Members
  • PipPipPipPipPipPip
  • 2,383 posts

Posted 25 November 2004 - 10:38 AM

I haven't used regular expression much in my life.
But I will use it if I understand the syntax.
Anyway. In Crimson Editor I used reg exp to replace a string.
Crimson Editor has an operator \0.
Which stands for "everything matched".
I was wondering if you added that.
On second thought... I'm not sure if it makes sense to add it.

#34 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 25 November 2004 - 06:36 PM

I haven't used regular expression much in my life.
But I will use it if I understand the syntax.
Anyway. In Crimson Editor I used reg exp to replace a string.
Crimson Editor has an operator \0.
Which stands for "everything matched".
I was wondering if you added that.
On second thought... I'm not sure if it makes sense to add it.

<{POST_SNAPBACK}>

You can store groups in an array and I have also added \# which stores the current cursor position. I have some plans as what goes in next, and that can be added to the RegExp TODO list.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#35 Jon

Jon

    Noooooooo!

  • Administrators
  • 10,648 posts

Posted 25 November 2004 - 09:51 PM

After adding the code I must admit I thought it was going to be more like php. (my only real exposure to it). All the php code I've seen seems to rely completely on a couple of functions:

preg_match() (Which seems to do what RegExp() does - just match)

and
preg_replace() which seems to be used in nearly every line of php like this:

// Ensure that spacing is preserved
$txt = preg_replace("/\t/", "&nbsp;&nbsp;&nbsp;&nbsp;", $txt);
$txt = preg_replace( "#\s{2}#", " &nbsp;", $txt );

Is this possible? Seems very useful.

I wasn't keen on the Set/Close functions either - is there really enough of a performance hit to require these? If it's not a massive diff then I'll probably remove.

#36 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 26 November 2004 - 06:28 PM

After adding the code I must admit I thought it was going to be more like php.  (my only real exposure to it).  All the php code I've seen seems to rely completely on a couple of functions:

preg_match() (Which seems to do what RegExp() does - just match)

and
preg_replace() which seems to be used in nearly every line of php like this:

// Ensure that spacing is preserved
$txt = preg_replace("/\t/", "&nbsp;&nbsp;&nbsp;&nbsp;", $txt);
$txt = preg_replace( "#\s{2}#", " &nbsp;", $txt );

Is this possible?  Seems very useful.

I wasn't keen on the Set/Close functions either - is there really enough of a performance hit to require these?  If it's not a massive diff then I'll probably remove.

<{POST_SNAPBACK}>

RegExpReplace() goes on the RegExp TO DO list. As far as the set and close functions, I will benchmark them this weekend. I could adjust it to cache the last few regular expressions, instead.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#37 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 28 November 2004 - 03:25 PM

Thanks for the shiny new toys, Nutster & Jon :)

I downloaded the RegExp version on Sunday, and here's some initial feedback..
  • The help file says that the the 3rd parameter in RegExp(), which identifies a variable that will receive hits, will be created in the same scope as DIM if it does not exist already: I couldn't get it to create such a variable if it didn't already exist, and always had to define it up-front.
  • Suggestion: could the array of hits that is returned by RegExp() please have an element with index=0, which indicates the highest index (like StringSplit() does)? Makes it easier to process the array.
  • The RegExp topic in the helpfile has hyperlinks that don't work: to RegExpSet and DIM topics. The only ones that do work are those under the "Related" paragraph.
  • There is no tab expression ("\t")..?
  • The Example for RegExpClose() is wrong.
HTH :)

#38 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 29 November 2004 - 05:26 PM

As far as the set and close functions, I will benchmark them this weekend.  I could adjust it to cache the last few regular expressions, instead.

<{POST_SNAPBACK}>

I have done the benchmarking. The savings are < 5% for time. I would say then to scrap the RegExpSet and RegExpClose functions and have F_RegExp cache the last so many in that list instead.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#39 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 29 November 2004 - 05:31 PM

Thanks for the shiny new toys, Nutster & Jon  :)

I downloaded the RegExp version on Sunday, and here's some initial feedback..

  • The help file says that the the 3rd parameter in RegExp(), which identifies a variable that will receive hits, will be created in the same scope as DIM if it does not exist already: I couldn't get it to create such a variable if it didn't already exist, and always had to define it up-front.
  • Suggestion: could the array of hits that is returned by RegExp() please have an element with index=0, which indicates the highest index (like StringSplit() does)? Makes it easier to process the array.
  • The RegExp topic in the helpfile has hyperlinks that don't work: to RegExpSet and DIM topics. The only ones that do work are those under the "Related" paragraph.
  • There is no tab expression ("\t")..?
  • The Example for RegExpClose() is wrong.
HTH  :)

<{POST_SNAPBACK}>

Um let's see.
  • I will check. That's what I get for always turning on "MustDeclareVars". ;)
  • That is what UBound is for. I personally dislike that feature of StringSplit.
  • RegExpSet is coming out. When I rewrite the docs, I will try to build the correct links to the Keywords.
  • Didn't think of tab. Should be easy enough to add.
  • RegExpClose is coming out.

[color=#800080;]David Nuttal[/color][color=#800080;]l[/color]

[color=#800080;]Nuttall Computer Consulting[/color]

[color=#000000;]An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.[/color]

[color=#0000FF;]I'm off to write a wizard, a wonderful wizard of odd...[/color]


#40 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 30 November 2004 - 07:07 AM

I have begin testing regexp yesterday, and I could say this is a very nice thing! But I'm not regexp guru, so one trouble here. It's should be possible to explicitly set "." character. I suppose that should be "\.". But all my tryings are fail :) There is code (i'm trying to realize simple file mask equivalent):

$line = "C:\Documents and Settings\User\NTUSER.DAT" If RegExp($line, '[.]*\.DAT$') Then            ; *.dat     Msgbox(0, "RegExp", "Pattern found") Else     Msgbox(0, "RegExp", "Pattern NOT found") Endif


But this example is working:

$line = "C:\Documents and Settings\User\NTUSER\DAT" If RegExp($line, '[.]*\\DAT$') Then            ; *.dat     Msgbox(0, "RegExp", "Pattern found") Else     Msgbox(0, "RegExp", "Pattern NOT found") Endif


So escape of "." is not working?
Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users