Jump to content

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more here. X
X


Photo

Regular Expression Testing


  • Please log in to reply
138 replies to this topic

#1 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 28 October 2004 - 02:33 PM

I am now in (I hope) final testing of the regular expression routines. I hope to have it done by oh, Monday, maybe the weekend. I have also done:
  • Binary search on the function list. Approx 20%-25% speed increase.
  • StringJoin
  • StringSplit takes whole string for delimiter
  • Added @CTimer (the number of seconds from Midnight Jan 1, 1970 UCT), @PI, @E (exp(1))
  • Adding way more comments than I do when I am writing for myself. ;) I am putting in more comments than I usually do for my classroom examples, because I will not be there to explain what the F*@< I am doing.
  • Couple more optimizations that I do not remember right now. :)
Just thought I would keep people up to date on what I have been doing, other than working for money :) Need that roof over the head and food on the table (well in the belly is better, but usually I can quickly change one into the other.)

Edited by Nutster, 28 October 2004 - 02:51 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...








#2 Matt @ MPCS

Matt @ MPCS

    Just another AutoIt user trying to help out! :)

  • Active Members
  • PipPipPipPipPipPip
  • 700 posts

Posted 28 October 2004 - 05:08 PM

What commands are going to be used to implement RegEx? I know it is talked about on the forum, but I have seen at least 2-3 methods of doing this. Could you give an example of the commands used?

*** Matt @ MPCS

#3 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 28 October 2004 - 05:50 PM

Ok, for the Regular Expressions as implemented (intended):
$x = RegExp($line, $pattern [,"Array"])

Perform a case-sensitive comparison of $line against the given pattern. Both are to be strings.

The pattern is defined using the following symbols:
  • "abc" - all regular characters that match themselves. i.e. match abc somewhere in the string.
  • "[abc]" - set: match one character that is a, b or c somewhere in the string
  • "[^abc]" - negated set: match one character other than a, b or c.
  • "b*" - matches 0 or more b's.
  • "b+" - matches 1 or more b's.
  • "b?" - optional: matches b if it is there, but does not have to be.
  • "^abc" - abc must appear at the beginning of the string.
  • "abc$" - abc must appear at the end of the string.
  • "(abc)" - group: treat the pattern inside the brackets as a unit. e.g. "(ab)+" matches "ab", "abab", "ababab", etc. The text that matches a group will be stored in the array if it is named in the RegExp call.
I am also including a whole bunch of class definitions: \s for any one whitespace, \d for any one digit, \a for any one alphabetic character, \A for any one alphanumeric character, \p for any punctuation character, \w for any word character (alphabetic or underline), \u for any upper-case character, \l (lower L) for any lower-case character. I think I have them all.

That what the docs are for. Man, was writing the docs a pain. Oh well, one of the necessary evils of writing a program is documenting it.

I have also written a function to keep track of regular expressions that you use repeatedly: RegExpSet and RegExpClose. RegExpSet interprets and stores the regular expression and returns a handle, similar to FileOpen. Do not mix up FileOpen handles and RegExpSet handles! They are not the same! You can replace the $pattern in the above function call with a handle given by RegExpSet. RegExpClose releases the memory that was used by the stored regular expression.

Using RegExpSet will speed up calls of RegExp because the pattern does not have to be interpreted each time, but only once. I only made 4 spaces to store regular expressions. Do people think that is enough?

Edit: Fix spelling errors.

Edited by Nutster, 29 October 2004 - 04:54 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#4 Matt @ MPCS

Matt @ MPCS

    Just another AutoIt user trying to help out! :)

  • Active Members
  • PipPipPipPipPipPip
  • 700 posts

Posted 28 October 2004 - 06:04 PM

It looks like a good solid implementation. Great Job Nutster! I don't know if it would be possible, but just to throw it out there. VB does RegEx through an operator named "Like" this makes the functionality inline with the rest of the code without having to make any explicit function calls. Would something similar be plausible in this situation?

*** Matt @ MPCS

#5 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 28 October 2004 - 06:36 PM

Cool, I was going to pop up your old thread today :)
Implementation looks nice and familiar enough. I really need it in my upcoming project, so I would like to test it.
Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)

#6 CyberSlug

CyberSlug

    Overwhelmed with work....

  • MVPs
  • 3,587 posts

Posted 28 October 2004 - 08:19 PM

Looks really really good :)

Do you have an identifier for matching any character?
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!

#7 Josbe

Josbe

    Infrequent ghost ☺

  • Active Members
  • PipPipPipPipPipPip
  • 1,585 posts

Posted 28 October 2004 - 11:12 PM

@David: Really, very good.
:) > Now, I understand why you didn't write lately... :)

#8 Jon

Jon

    Up all night to get lucky

  • Administrators
  • 10,294 posts

Posted 29 October 2004 - 10:07 AM

Just thought I would keep people up to date on what I have been doing, other than working for money :) Need that roof over the head and food on the table (well in the belly is better, but usually I can quickly change one into the other.)

<{POST_SNAPBACK}>

It would be sweet to be able to work on hobby code for a living. :) But then I suppose it would become work and we'd end up moaning anyway!

#9 condoman

condoman

    Polymath

  • Active Members
  • PipPipPipPip
  • 233 posts

Posted 29 October 2004 - 12:55 PM

Moaning good. RegEx good. Life Good. :)

#10 sugi

sugi

    Universalist

  • Active Members
  • PipPipPipPipPipPip
  • 441 posts

Posted 29 October 2004 - 01:25 PM

I noticed that one of the best features of regex is missing: Backreference

For example you can have a regex: ^([hH][eE][lL][lL][oO]) and \1$
This would match the following lines:
Hello and Hello
hello and hello
hELLo and hELLo
...
but would not match the following lines:
Hello and hello
helLO and HELLO
So basically \1 references back to the match in the first brackets (works usually up to \9 which equals the ninth brackets).

This is a very powerfull feature if you know how to use it.

There's a complete description of Regex available here (yes, I know it's old. But regex hasn't been changed much since 1992).

Edited by sugi, 29 October 2004 - 01:27 PM.


#11 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 29 October 2004 - 05:25 PM

Ok, here we go.
  • Matt @ MPCS: Not practical at the moment. Adding a new operator is a little bit of a pain because of the way it is implemented.
  • CyberSlug: I knew I forgot something. . (dot) matches any character. This is what I get for writing this from memory.
  • Jon: Why would I be moaning about writing AutoIt? When my cousins asked me what I was doing at a family reunion I said that I was part of an international programming project to write the Windows automation tool called AutoIt. Oh, and I am writing this big database for a company.
  • sugi: Back-reference. Hmm, now there's an idea. I have looked through the reference you suggested and got a lot of ideas for the next version of RegExp. Let me get what I have working before adding more stuff to it.
Oh, well back to work. I will see what I can do to finish this on the weekend and submit to Jon on Monday.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#12 Klaatu

Klaatu

    Prodigy

  • Active Members
  • PipPipPip
  • 198 posts

Posted 29 October 2004 - 06:17 PM

Ok, for the Regular Expressions as implemented (intended):
$x = RegExp($line, $pattern [,"Array"])

Perform a case-sensitive comparison of $line against the given pattern.  Both are to be strings.

Nutster: could you be a bit more specific about what the function returns, and about what is returned in the "Array" (assuming anything is)?

For instance, say I had a string "You were charged $49.57.", and I wanted to get at the amount. Now, I'm not very good at REs, but let say
$line = "You were charged $49.57." $re = "\d+\.\d\d" $array = 0 $x = RegExp($line, $re, $array)

What would the value of $x be? And the value of $array? And how could I extract the "49.57" from the string after having used your RegExp function? I am assuming right now that $array[0] would equal 19 (the position within the string where the RE match starts, and $array[1] would equal 23 (the position where the RE match ends), but I am most likely incorrect. So, could you clarify?

TIA
My Projects:DebugIt - Debug your AutoIt scripts with DebugIt!

#13 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 29 October 2004 - 06:59 PM

Nutster: could you be a bit more specific about what the function returns, and about what is returned in the "Array" (assuming anything is)?

For instance, say I had a string "You were charged $49.57.", and I wanted to get at the amount. Now, I'm not very good at REs, but let say

$line = "You were charged $49.57." $re = "\d+\.\d\d" $array = 0 $x = RegExp($line, $re, $array)

What would the value of $x be? And the value of $array? And how could I extract the "49.57" from the string after having used your RegExp function? I am assuming right now that $array[0] would equal 19 (the position within the string where the RE match starts, and $array[1] would equal 23 (the position where the RE match ends), but I am most likely incorrect. So, could you clarify?

TIA

<{POST_SNAPBACK}>

The function returns 1 for success and 0 for failure, and sets @Error is the regular expression is bad. The Array gets the contents of each group, in a single-dimensioned array, blowing away its old contents. If the array is given and there is no match then the array is replaced with an empty string. Right now, I am looking at passing the name, not the variable itself. I think Valik did something to compare variables that are passed, but I do not remember how to use it. So change your code to:
$line = "You were charged $49.57." $re = "(\d+\.\d\d)" $array = 0 $x = RegExp($line, $re, "array")

Otherwise, everything is correct. "49.57" would be in $array[0]. The position of matches is not stored. You can use StringInStr to find where $array[0] is in $line.
I guess I could add (next version) \# to store the current position in the array.

Edited by Nutster, 03 November 2004 - 04:38 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#14 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 01 November 2004 - 03:13 PM

Awesome, David :) .. thanks for all the hard work! Can't wait to get my hands dirty :)

#15 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 01 November 2004 - 03:30 PM

Just wondering if you have included some of the features you mentioned previously:
  • \t - tab character
  • \n - newline
  • \w - a word = any set of contiguous alphanumeric set of chars including "_", but excluding whitespace chars.
  • \* - an actual *, similarly for other control characters = +.\? ..etc.
  • [^A-Z] - Exclusion set = anything other than the characters specified.
:)

Edited by trids, 01 November 2004 - 03:31 PM.


#16 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 01 November 2004 - 04:29 PM

Just wondering if you have included some of the features you mentioned previously:

  • \t - tab character
  • \n - newline
  • \w - a word = any set of contiguous alphanumeric set of chars including "_", but excluding whitespace chars.
  • \* - an actual *, similarly for other control characters = +.\? ..etc.
  • [^A-Z] - Exclusion set = anything other than the characters specified.

<{POST_SNAPBACK}>

Let's see. BTW, still testing. It is so easy for little bugs to get in the system and screw critical things up. Ok, I guess I will not be submitting today.
  • Tab. Not yet. Can add tonight.
  • Newline. Implemented. Need to test.
  • \w - Implemented as a single character. Do \w+ to get the whole word.
  • \* - well any special character. Implemented, but still need to test fully.
  • Exclusion set - implemented, but (guess what?!) still needs testing.
Added \# over the weekend, but still need to test along with the other stuff about storing values in the array.

Oh well, the docs will contain a list of everything I have implemented. Hey docs team would you be willing to clean up what I submit? I have used a different approach for explaining regular expressions than the stuff I saw in my Perl docs or the link the sugi gave me.. I think it is an easier way of understanding, but I will see what you guys think, when I get around to submitting it. I am using the source from Oct 27 as the base.

Edited by Nutster, 01 November 2004 - 04:37 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#17 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 03 November 2004 - 08:29 AM

I almost forgot: the OR character is very handy too .. usually a "pipe" (|)

$sTarget = "(jan)|(feb)" ;finds either "jan" or "feb"

#18 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 03 November 2004 - 04:32 PM

I almost forgot: the OR character is very handy too .. usually a "pipe" (|)

$sTarget = "(jan)|(feb)"    ;finds either "jan" or "feb"

<{POST_SNAPBACK}>

Next version. It was turning out to be a bit of a pain to implement, so I dropped it for now. Definiately on the TO DO list. Should not need the brackets.
$sTarget = "jan|feb"    ; finds "jan" or "feb"

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#19 Matt @ MPCS

Matt @ MPCS

    Just another AutoIt user trying to help out! :)

  • Active Members
  • PipPipPipPipPipPip
  • 700 posts

Posted 03 November 2004 - 04:36 PM

As an alternative method to implementing the OR that way, you could always inverse the code you use for Exclusion sets ("[^a-zA-Z]") to make an INclusion set. Just a thought. I know I have seen it implenented like that before somewhere.

*** Matt @ MPCS

#20 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 03 November 2004 - 04:43 PM

As an alternative method to implementing the OR that way, you could always inverse the code you use for Exclusion sets ("[^a-zA-Z]") to make an INclusion set. Just a thought. I know I have seen it implenented like that before somewhere.

*** Matt @ MPCS

<{POST_SNAPBACK}>

All I do is check if the character did match and invert the value.
// C++ if (m_type & at_not)     bFound = ! bFound

Edited by Nutster, 12 November 2004 - 04:58 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users