Jump to content

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more here. X
X


Photo

Regular Expression Testing


  • Please log in to reply
138 replies to this topic

#41 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 30 November 2004 - 07:49 AM

[..]
So escape of "." is not working?

<{POST_SNAPBACK}>

I think you're experienceing the repeating characters phenomenon: from the helpfile .. "Repeating charactrers (*, +, ?) will try to match the largest set possible. e.g. ba*a will always fail because the trailing a will have already matched the repeating a."

Try this :) ...
   $line = "C:\Documents and Settings\User\NTUSER.DAT"     If RegExp($line, '\.DAT$') Then           ;  .dat at end-of-line        Msgbox(0, "RegExp", "Pattern found")     Else        Msgbox(0, "RegExp", "Pattern NOT found")     Endif








#42 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 30 November 2004 - 09:25 AM

I think you're experienceing the repeating characters phenomenon: from the helpfile .. "Repeating charactrers (*, +, ?) will try to match the largest set possible. e.g. ba*a will always fail because the trailing a will have already matched the repeating a."

Well, but why second code is working? :)

Another example, I just removed repeating:

$line = "NTUSER.DAT" If RegExp($line, '\.DAT$') Then ...


Because "." escaped this should not match, but this did.

Try this  :)  ...

   $line = "C:\Documents and Settings\User\NTUSER.DAT"     If RegExp($line, '\.DAT$') Then          ;  .dat at end-of-line        Msgbox(0, "RegExp", "Pattern found")     Else        Msgbox(0, "RegExp", "Pattern NOT found")     Endif

<{POST_SNAPBACK}>

In this case "NTUSERDAT" will be also matched, so I will not to know that "DAT" was an extension.
Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)

#43 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 30 November 2004 - 09:44 AM

Hmm .. I'm not sure why '[.]*\\DAT$' works on $line = "C:\Documents and Settings\User\NTUSER\DAT" :) .. maybe Nutster can explain?

But .. I think you'll find '\.DAT$' does NOT in fact match "C:\Documents and Settings\User\NTUSERDAT"
.. I tried it, and it correctly fails :) .. cos it's looking for ".DAT" at end of line

Hope this helps

Edit: typo + more -->

[..]
Another example, I just removed repeating:

$line = "NTUSER.DAT" If RegExp($line, '\.DAT$') Then ...


Because "." escaped this should not match, but this did.
[..]

<{POST_SNAPBACK}>

.. This SHOULD match (and it does) because:
".DAT" means "any-char-followed-byDAT"
but "\.DAT" means "dot-followed-byDAT" <-- the repeating character "." is escaped back to a literal meaning.

I may be wrong .. but this is how I understand it ;)

Edited by trids, 30 November 2004 - 10:22 AM.


#44 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 30 November 2004 - 11:06 AM

But .. I think you'll find '\.DAT$' does NOT in fact match "C:\Documents and Settings\User\NTUSERDAT"
.. I tried it, and it correctly fails  :)  .. cos it's looking for ".DAT" at end of line


Oops :) I really forgot $ at the end... And I again try to treat regexp like usual filemasks. Thanks, I got it now. I will try to bypass this prob...
Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)

#45 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 30 November 2004 - 07:01 PM

Hmm .. I'm not sure why '[.]*\\DAT$'  works on $line = "C:\Documents and Settings\User\NTUSER\DAT"  :idiot:  .. maybe Nutster can explain?

<{POST_SNAPBACK}>

'[.]*\\DAT$' mean zero or more "real" dots. There are 0 dots, so that works. \\ matches the real backslash and DAT$ matches the last 3 characters of the string. [.]*\. will never match the last dot because it was already read by the repeating set. Both [.] and \. will match a real dot.

Edited by Nutster, 30 November 2004 - 07:03 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#46 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 30 November 2004 - 07:48 PM

Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern... :idiot:

$line = "C:\Documents and Settings\User\NTUSER.DAT" RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't match RegExp($line, '^C:\\Documents and Settings[\A\\]*'); works until here


1. Matching ^ - start of line
2. Exactly matching C:\Documents and Settings
3. Next starts run of any number of alfanumeric symbols, slashes or nothing of it
4. Next should be real dot and DAT at and of line - but this is not match.

Please direct me where I was wrong...

BTW I'm read some info about PHP regexp (which mainly the same as current implementation), and found that although by default they are consume next char(s) after * or +, it's possible to use "?" after them, which stop consume effect (ab*?b will not consume last "b"). Current Autoit implementation of "?" seems not have the same "magic"...

Edit: accidental smile conversion :D

Edited by Lazycat, 30 November 2004 - 07:52 PM.

Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)

#47 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 01 December 2004 - 04:48 AM

'[.]*\\DAT$' mean zero or more "real" dots.  There are 0 dots, so that works.  \\ matches the real backslash and DAT$ matches the last 3 characters of the string.  [.]*\. will never match the last dot because it was already read by the repeating set.  Both [.] and \. will match a real dot.

<{POST_SNAPBACK}>

:D .. there we go! Of course, "*" includes the possibility of no ocurrence!

With regexp, I sometimes feel like a child playing with daddy's power-tools.
:idiot:

#48 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 01 December 2004 - 05:43 AM

[..]
Brains is hot and I feel that I (and sure many other) just need few good examples
[..]

<{POST_SNAPBACK}>

I agree .. perhaps Jon will consider opening another Forum called Regexp Support for examples & questions? :idiot:

Edit:
Lazycat, I experimented with your regexp "^C:\\Documents and Settings[\A\\]*\.DAT$", and it looks like we can't use the special tokens, like \A, in a set. Or at least, it doesn't recognise \A in a set.

Nutster, I notice that regexps in other apps (like TextPad) have special tokens that are specifically for use inside sets. Maybe this is an idea? ..

[Regexp Token] Description
[:alpha:] Any letter.
[:lower:] Any lower case letter.
[:upper:] Any upper case letter.
[:alnum:] Any digit or letter.
[:digit:] Any digit.
[:xdigit:] Any hexadecimal digit (0-9, a-f or A-F).
[:blank:] Space or tab.
[:space:] Space, tab, vertical tab or form feed.
[:cntrl:] Control characters (Delete and ASCII codes less than space).
[:print:] Printable characters, including space.
[:graph:] Printable characters, excluding space.
[:punct:] Anything that is not a control or alphanumeric character.
[:word:] Letters, hypens and apostrophes.


Edited by trids, 01 December 2004 - 06:10 AM.


#49 Lazycat

Lazycat

    Coding cat

  • MVPs
  • 1,174 posts

Posted 01 December 2004 - 08:24 AM

With regexp, I sometimes feel like a child playing with daddy's power-tools.
:D

<{POST_SNAPBACK}>

Actually I start feel the same :lol:

I'm not so many worked with regexp before, mainly in the programs internal variants (EditPlus, Total Commander, Proxomitron), but never have so many troubles... most of my known solutions doesn't work with the Autoit. :idiot:

Edited by Lazycat, 01 December 2004 - 08:26 AM.

Koda homepage (http://www.autoitscript.com/fileman/users/lookfar/formdesign.html) (Bug Tracker)My Autoit script page (http://www.autoitscript.com/fileman/users/Lazycat/)

#50 sugi

sugi

    Universalist

  • Active Members
  • PipPipPipPipPipPip
  • 441 posts

Posted 01 December 2004 - 09:25 AM

[code=auto:0]
$line = "C:\Documents and Settings\User\NTUSER.DAT"
RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't match

egrep matches this (when I substitue \A with [:alpha:] which is the common way to address character classes) so this should be a problem with the AutoIt implementation.


With regexp, I sometimes feel like a child playing with daddy's power-tools.
:idiot:

I can suggest O'Reilly's sed&awk for that. Sed and awk both heavily rely on regex and there's a good introduction to regex in this book. And when you've started with sed you'll miss it on every windows system :D
There's also a book called Mastering Regular Expressions from O'Reilly. I don't know this book but it seems to be available online. Here's a quote from its intended audience:

This book will interest anyone who has an opportunity to use regular expressions. If you don't yet understand the power that regular expressions can provide, you should benefit greatly as a whole new world is opened up to you.


Edited by sugi, 01 December 2004 - 09:27 AM.


#51 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 01 December 2004 - 04:49 PM

I agree .. perhaps Jon will consider opening another Forum called Regexp Support for examples & questions?  :D

Edit:
Lazycat, I experimented with your regexp "^C:\\Documents and Settings[\A\\]*\.DAT$", and it looks like we can't use the special tokens, like \A, in a set. Or at least, it doesn't recognise \A in a set.

Nutster, I notice that regexps in other apps (like TextPad) have special tokens that are specifically for use inside sets. Maybe this is an idea? ..

<{POST_SNAPBACK}>

RegExp support forum is just scary! :idiot:

The list of [:token:] sequences is already on the TO DO list. But you can replace most of them with escaped sequences. e.g. [:digit:] is \d.

Edited by Nutster, 01 December 2004 - 05:04 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#52 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 01 December 2004 - 05:00 PM

Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern...  :idiot:

$line = "C:\Documents and Settings\User\NTUSER.DAT" RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't match RegExp($line, '^C:\\Documents and Settings[\A\\]*'); works until here


1. Matching ^ - start of line
2. Exactly matching C:\Documents and Settings
3. Next starts run of any number of alfanumeric symbols, slashes or nothing of it
4. Next should be real dot and DAT at and of line - but this is not match.

Please direct me where I was wrong...

BTW I'm read some info about PHP regexp (which mainly the same as current implementation), and found  that although by default they are consume next char(s) after * or +, it's possible to use "?" after them, which stop consume effect (ab*?b will not consume last "b"). Current Autoit implementation of "?" seems not have the same "magic"...

Edit: accidental smile conversion  :D

<{POST_SNAPBACK}>

Point 3: escaped sequences are not supported in sets. Try [A-Z\a-z]* instead. The [\A\\]* tries to match "A" or "\" (ignoring extra definitions of \) and finding no occurances, succeeds on zero or more matches.

*?, +?, ?? are already on my to do list. When I find the time, and all the bugs are out, I plan on tackling the items on the to do list. ab*?b will then match the smallest group that lets the next character work. This will still lead to problems, because the b*? would match 0 b's and the next b would match. I do not advise b*b patterns, as they will almost always fail in the current implementation.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#53 sugi

sugi

    Universalist

  • Active Members
  • PipPipPipPipPipPip
  • 441 posts

Posted 01 December 2004 - 05:09 PM

*?, +?, ?? are already on my to do list.  When I find the time, and all the bugs are out,  I plan on tackling the items on the to do list.  ab*?b will then match the smallest group that lets the next character work.  This will still lead to problems, because the b*? would match 0 b's and the next b would match.  I do not advise b*b patterns, as they will almost always fail in the current implementation.

Basically you have to try every possibility to see if everything matches and that's why all regex implementations are usually slow compared to normal string functions.
Maybe it's just easier to use the regex functions from the GNU libc from linux or something like that. They provide a full implementation with pretty much all known bugs squashed.

#54 SlimShady

SlimShady

    AutoIt lover

  • Active Members
  • PipPipPipPipPipPip
  • 2,383 posts

Posted 01 December 2004 - 05:22 PM

Maybe it's just easier to use the regex functions from the GNU libc from linux or something like that. They provide a full implementation with pretty much all known bugs squashed.

<{POST_SNAPBACK}>

You should've said that before he started 3 months of hard work.

Edited by SlimShady, 01 December 2004 - 05:24 PM.


#55 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 01 December 2004 - 06:44 PM

You should've said that before he started 3 months of hard work.

<{POST_SNAPBACK}>

:idiot: About a month, actually. A good chunk of the time was spent do "real" work programming, so I could (and still can) put very little time into the programming. I have already received some requests for bug fixes. I hope to be able to tackle them this weekend, maybe some of the simple enhancements as well.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#56 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 07 December 2004 - 01:33 PM

I've been playing around with the new (and exciting :idiot: ) RegExp() .. thanks again, Nutster!

Just wondering: is there any chance of making RegExp() function follow the same principles employed by the likes of PixelSearch(), MouseGetPos(), DriveGetDrive(), etc - so that the return value would be the array of "hits", instead of the current approach where the array is provided as a string parameter to the function?

So it might work as follows:

Return Value
Success: Returns a zero-based array of matching groups found by the regular expression pattern.
@Error:
0 = Pattern matched successfully.
1 = The regular expression given is not valid.
2 = The handle given is not valid.

Not only in the interests of consistency and user-friendliness, but it simplifies the issue of whether or not to declare the array up-front too (which I guess is also in the interests of consistency and user-friendliness :D )

hmm .. whaddaya think?

Edits: minor

Edited by trids, 07 December 2004 - 01:39 PM.


#57 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 07 December 2004 - 07:19 PM

Just wondering: is there any chance of making RegExp() function follow the same principles employed by the likes of PixelSearch(), MouseGetPos(), DriveGetDrive(), etc - so that the return value would be the array of "hits", instead of the current approach where the array is provided as a string parameter to the function?

So it might work as follows:
Not only in the interests of consistency and user-friendliness, but it simplifies the issue of whether or not to declare the array up-front too (which I guess is also in the interests of consistency and user-friendliness  :idiot: )

hmm .. whaddaya think?

Edits: minor

<{POST_SNAPBACK}>

Hmm, so how would this be called?
$Results = RegExp($sLine, $sPattern) If @Error = 0 Then   ; Found the pattern ElseIf @Error = 1 Then   ; Did not find the pattern ElseIf @Error = 2 Then   ; The pattern was not valid Endif

I have removed the handle approach (RegExpSet, RegExpClose).

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#58 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 07 December 2004 - 08:07 PM

Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern...  :idiot:

<{POST_SNAPBACK}>

I just posted my testing script http://www.autoitscript.com/fileman/users/Nutster/test%20regexp%202.au3 to give you some examples. Try all sorts of patterns yourself. This one works with the version I uploaded to Jon today. I will be posting a better one in a few days to work with the updated version that has some of the TO DO list items implemented.

Edited by Nutster, 07 December 2004 - 08:11 PM.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...


#59 trids

trids

    Hmmm .. and what have we here?

  • Active Members
  • PipPipPipPipPipPip
  • 1,004 posts

Posted 08 December 2004 - 06:58 AM

Hmm, so how would this be called?

$Results = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid Endif

<{POST_SNAPBACK}>

.. yes ..
$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then  ; Found the pattern  ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then  ; Did not find the pattern ElseIf @Error = 2 Then  ; The pattern was not valid Endif

.. or ..
$asResults = RegExp($sLine, $sPattern) If @Error Then  ; Something is wrong Else  ; Found the pattern  ; .. and the hits are in a zero-based array called $asResults Endif


#60 Nutster

Nutster

    Developer at Large

  • Developers
  • 1,450 posts

Posted 08 December 2004 - 05:51 PM

.. yes ..

$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid Endif

<{POST_SNAPBACK}>

Or
$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then ; Did not find the pattern ; $asResults = "" ElseIf @Error = 2 Then ; The pattern was not valid ; $asResults = "" Endif

This can solve the problems with storing back-references when I implement them as well as RegExpReplace. Ok. I will go this way. @Error will indicate whether the search worked or not (or buggered up completely because of a screwed pattern. I think the return in that case should indicate where the problem occured in the pattern.

David Nuttall

Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius
AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users