trids Posted November 30, 2004 Share Posted November 30, 2004 [..]So escape of "." is not working?<{POST_SNAPBACK}>I think you're experienceing the repeating characters phenomenon: from the helpfile .. "Repeating charactrers (*, +, ?) will try to match the largest set possible. e.g. ba*a will always fail because the trailing a will have already matched the repeating a."Try this ...$line = "C:\Documents and Settings\User\NTUSER.DAT" If RegExp($line, '\.DAT$') Then ; .dat at end-of-line Msgbox(0, "RegExp", "Pattern found") Else Msgbox(0, "RegExp", "Pattern NOT found") Endif Link to comment Share on other sites More sharing options...
Lazycat Posted November 30, 2004 Share Posted November 30, 2004 I think you're experienceing the repeating characters phenomenon: from the helpfile .. "Repeating charactrers (*, +, ?) will try to match the largest set possible. e.g. ba*a will always fail because the trailing a will have already matched the repeating a."Well, but why second code is working? Another example, I just removed repeating:$line = "NTUSER.DAT" If RegExp($line, '\.DAT$') Then ...Because "." escaped this should not match, but this did.Try this ...$line = "C:\Documents and Settings\User\NTUSER.DAT" If RegExp($line, '\.DAT$') Then ; .dat at end-of-line Msgbox(0, "RegExp", "Pattern found") Else Msgbox(0, "RegExp", "Pattern NOT found") Endif<{POST_SNAPBACK}>In this case "NTUSERDAT" will be also matched, so I will not to know that "DAT" was an extension. Koda homepage ([s]Outdated Koda homepage[/s]) (Bug Tracker)My Autoit script page ([s]Outdated mirror[/s]) Link to comment Share on other sites More sharing options...
trids Posted November 30, 2004 Share Posted November 30, 2004 (edited) Hmm .. I'm not sure why '[.]*\\DAT$' works on $line = "C:\Documents and Settings\User\NTUSER\DAT" .. maybe Nutster can explain?But .. I think you'll find '\.DAT$' does NOT in fact match "C:\Documents and Settings\User\NTUSERDAT" .. I tried it, and it correctly fails .. cos it's looking for ".DAT" at end of lineHope this helpsEdit: typo + more -->[..]Another example, I just removed repeating:$line = "NTUSER.DAT" If RegExp($line, '\.DAT$') Then ...Because "." escaped this should not match, but this did.[..]<{POST_SNAPBACK}>.. This SHOULD match (and it does) because: ".DAT" means "any-char-followed-byDAT" but "\.DAT" means "dot-followed-byDAT" <-- the repeating character "." is escaped back to a literal meaning.I may be wrong .. but this is how I understand it Edited November 30, 2004 by trids Link to comment Share on other sites More sharing options...
Lazycat Posted November 30, 2004 Share Posted November 30, 2004 But .. I think you'll find '\.DAT$' does NOT in fact match "C:\Documents and Settings\User\NTUSERDAT" .. I tried it, and it correctly fails .. cos it's looking for ".DAT" at end of lineOops I really forgot $ at the end... And I again try to treat regexp like usual filemasks. Thanks, I got it now. I will try to bypass this prob... Koda homepage ([s]Outdated Koda homepage[/s]) (Bug Tracker)My Autoit script page ([s]Outdated mirror[/s]) Link to comment Share on other sites More sharing options...
Nutster Posted November 30, 2004 Author Share Posted November 30, 2004 (edited) Hmm .. I'm not sure why '[.]*\\DAT$' works on $line = "C:\Documents and Settings\User\NTUSER\DAT" .. maybe Nutster can explain?<{POST_SNAPBACK}>'[.]*\\DAT$' mean zero or more "real" dots. There are 0 dots, so that works. \\ matches the real backslash and DAT$ matches the last 3 characters of the string. [.]*\. will never match the last dot because it was already read by the repeating set. Both [.] and \. will match a real dot. Edited November 30, 2004 by Nutster David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
Lazycat Posted November 30, 2004 Share Posted November 30, 2004 (edited) Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern... $line = "C:\Documents and Settings\User\NTUSER.DAT" RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't match RegExp($line, '^C:\\Documents and Settings[\A\\]*'); works until here 1. Matching ^ - start of line 2. Exactly matching C:\Documents and Settings 3. Next starts run of any number of alfanumeric symbols, slashes or nothing of it 4. Next should be real dot and DAT at and of line - but this is not match. Please direct me where I was wrong... BTW I'm read some info about PHP regexp (which mainly the same as current implementation), and found that although by default they are consume next char(s) after * or +, it's possible to use "?" after them, which stop consume effect (ab*?b will not consume last "b"). Current Autoit implementation of "?" seems not have the same "magic"... Edit: accidental smile conversion Edited November 30, 2004 by Lazycat Koda homepage ([s]Outdated Koda homepage[/s]) (Bug Tracker)My Autoit script page ([s]Outdated mirror[/s]) Link to comment Share on other sites More sharing options...
trids Posted December 1, 2004 Share Posted December 1, 2004 '[.]*\\DAT$' mean zero or more "real" dots. There are 0 dots, so that works. \\ matches the real backslash and DAT$ matches the last 3 characters of the string. [.]*\. will never match the last dot because it was already read by the repeating set. Both [.] and \. will match a real dot.<{POST_SNAPBACK}> .. there we go! Of course, "*" includes the possibility of no ocurrence!With regexp, I sometimes feel like a child playing with daddy's power-tools. Link to comment Share on other sites More sharing options...
trids Posted December 1, 2004 Share Posted December 1, 2004 (edited) [..]Brains is hot and I feel that I (and sure many other) just need few good examples[..]<{POST_SNAPBACK}>I agree .. perhaps Jon will consider opening another Forum called Regexp Support for examples & questions? Edit:Lazycat, I experimented with your regexp "^C:\\Documents and Settings[\A\\]*\.DAT$", and it looks like we can't use the special tokens, like \A, in a set. Or at least, it doesn't recognise \A in a set.Nutster, I notice that regexps in other apps (like TextPad) have special tokens that are specifically for use inside sets. Maybe this is an idea? .. [Regexp Token] Description [:alpha:] Any letter. [:lower:] Any lower case letter. [:upper:] Any upper case letter. [:alnum:] Any digit or letter. [:digit:] Any digit. [:xdigit:] Any hexadecimal digit (0-9, a-f or A-F). [:blank:] Space or tab. [:space:] Space, tab, vertical tab or form feed. [:cntrl:] Control characters (Delete and ASCII codes less than space). [:print:] Printable characters, including space. [:graph:] Printable characters, excluding space. [:punct:] Anything that is not a control or alphanumeric character. [:word:] Letters, hypens and apostrophes. Edited December 1, 2004 by trids Link to comment Share on other sites More sharing options...
Lazycat Posted December 1, 2004 Share Posted December 1, 2004 (edited) With regexp, I sometimes feel like a child playing with daddy's power-tools. <{POST_SNAPBACK}>Actually I start feel the same I'm not so many worked with regexp before, mainly in the programs internal variants (EditPlus, Total Commander, Proxomitron), but never have so many troubles... most of my known solutions doesn't work with the Autoit. Edited December 1, 2004 by Lazycat Koda homepage ([s]Outdated Koda homepage[/s]) (Bug Tracker)My Autoit script page ([s]Outdated mirror[/s]) Link to comment Share on other sites More sharing options...
sugi Posted December 1, 2004 Share Posted December 1, 2004 (edited) expandcollapse popup$line = "C:\Documents and Settings\User\NTUSER.DAT"RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't matchegrep matches this (when I substitue \A with [:alpha:] which is the common way to address character classes) so this should be a problem with the AutoIt implementation.With regexp, I sometimes feel like a child playing with daddy's power-tools. I can suggest O'Reilly's sed&awk for that. Sed and awk both heavily rely on regex and there's a good introduction to regex in this book. And when you've started with sed you'll miss it on every windows system There's also a book called Mastering Regular Expressions from O'Reilly. I don't know this book but it seems to be available online. Here's a quote from its intended audience:This book will interest anyone who has an opportunity to use regular expressions. If you don't yet understand the power that regular expressions can provide, you should benefit greatly as a whole new world is opened up to you. Edited December 1, 2004 by sugi Link to comment Share on other sites More sharing options... Nutster Posted December 1, 2004 Nutster Developers 1.4k Developer at Large Author Share Posted December 1, 2004 (edited) I agree .. perhaps Jon will consider opening another Forum called Regexp Support for examples & questions? Edit:Lazycat, I experimented with your regexp "^C:\\Documents and Settings[\A\\]*\.DAT$", and it looks like we can't use the special tokens, like \A, in a set. Or at least, it doesn't recognise \A in a set.Nutster, I notice that regexps in other apps (like TextPad) have special tokens that are specifically for use inside sets. Maybe this is an idea? ..<{POST_SNAPBACK}>RegExp support forum is just scary! The list of [:token:] sequences is already on the TO DO list. But you can replace most of them with escaped sequences. e.g. [:digit:] is \d. Edited December 1, 2004 by Nutster David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options... Nutster Posted December 1, 2004 Nutster Developers 1.4k Developer at Large Author Share Posted December 1, 2004 Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern... $line = "C:\Documents and Settings\User\NTUSER.DAT" RegExp($line, '^C:\\Documents and Settings[\A\\]*\.DAT$'); don't match RegExp($line, '^C:\\Documents and Settings[\A\\]*'); works until here1. Matching ^ - start of line2. Exactly matching C:\Documents and Settings3. Next starts run of any number of alfanumeric symbols, slashes or nothing of it4. Next should be real dot and DAT at and of line - but this is not match. Please direct me where I was wrong...BTW I'm read some info about PHP regexp (which mainly the same as current implementation), and found that although by default they are consume next char(s) after * or +, it's possible to use "?" after them, which stop consume effect (ab*?b will not consume last "b"). Current Autoit implementation of "?" seems not have the same "magic"...Edit: accidental smile conversion <{POST_SNAPBACK}>Point 3: escaped sequences are not supported in sets. Try [A-Z\a-z]* instead. The [\A\\]* tries to match "A" or "\" (ignoring extra definitions of \) and finding no occurances, succeeds on zero or more matches.*?, +?, ?? are already on my to do list. When I find the time, and all the bugs are out, I plan on tackling the items on the to do list. ab*?b will then match the smallest group that lets the next character work. This will still lead to problems, because the b*? would match 0 b's and the next b would match. I do not advise b*b patterns, as they will almost always fail in the current implementation. David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
sugi Posted December 1, 2004 Share Posted December 1, 2004 *?, +?, ?? are already on my to do list. When I find the time, and all the bugs are out, I plan on tackling the items on the to do list. ab*?b will then match the smallest group that lets the next character work. This will still lead to problems, because the b*? would match 0 b's and the next b would match. I do not advise b*b patterns, as they will almost always fail in the current implementation.Basically you have to try every possibility to see if everything matches and that's why all regex implementations are usually slow compared to normal string functions.Maybe it's just easier to use the regex functions from the GNU libc from linux or something like that. They provide a full implementation with pretty much all known bugs squashed. Link to comment Share on other sites More sharing options...
SlimShady Posted December 1, 2004 Share Posted December 1, 2004 (edited) Maybe it's just easier to use the regex functions from the GNU libc from linux or something like that. They provide a full implementation with pretty much all known bugs squashed.<{POST_SNAPBACK}>You should've said that before he started 3 months of hard work. Edited December 1, 2004 by SlimShady Link to comment Share on other sites More sharing options...
Nutster Posted December 1, 2004 Author Share Posted December 1, 2004 You should've said that before he started 3 months of hard work.<{POST_SNAPBACK}> About a month, actually. A good chunk of the time was spent do "real" work programming, so I could (and still can) put very little time into the programming. I have already received some requests for bug fixes. I hope to be able to tackle them this weekend, maybe some of the simple enhancements as well. David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
trids Posted December 7, 2004 Share Posted December 7, 2004 (edited) I've been playing around with the new (and exciting ) RegExp() .. thanks again, Nutster!Just wondering: is there any chance of making RegExp() function follow the same principles employed by the likes of PixelSearch(), MouseGetPos(), DriveGetDrive(), etc - so that the return value would be the array of "hits", instead of the current approach where the array is provided as a string parameter to the function?So it might work as follows:Return ValueSuccess: Returns a zero-based array of matching groups found by the regular expression pattern.@Error: 0 = Pattern matched successfully. 1 = The regular expression given is not valid. 2 = The handle given is not valid.Not only in the interests of consistency and user-friendliness, but it simplifies the issue of whether or not to declare the array up-front too (which I guess is also in the interests of consistency and user-friendliness )hmm .. whaddaya think?Edits: minor Edited December 7, 2004 by trids Link to comment Share on other sites More sharing options...
Nutster Posted December 7, 2004 Author Share Posted December 7, 2004 Just wondering: is there any chance of making RegExp() function follow the same principles employed by the likes of PixelSearch(), MouseGetPos(), DriveGetDrive(), etc - so that the return value would be the array of "hits", instead of the current approach where the array is provided as a string parameter to the function?So it might work as follows:Not only in the interests of consistency and user-friendliness, but it simplifies the issue of whether or not to declare the array up-front too (which I guess is also in the interests of consistency and user-friendliness )hmm .. whaddaya think?Edits: minor<{POST_SNAPBACK}>Hmm, so how would this be called?$Results = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid EndifI have removed the handle approach (RegExpSet, RegExpClose). David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
Nutster Posted December 7, 2004 Author Share Posted December 7, 2004 (edited) Thanks, I not realize that [.] mean real dot... Brains is hot and I feel that I (and sure many other) just need few good examples... But I can't sleep until I'll know why not match this pattern... <{POST_SNAPBACK}>I just posted my testing script http://www.autoitscript.com/fileman/users/Nutster/test%20regexp%202.au3 to give you some examples. Try all sorts of patterns yourself. This one works with the version I uploaded to Jon today. I will be posting a better one in a few days to work with the updated version that has some of the TO DO list items implemented. Edited December 7, 2004 by Nutster David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
trids Posted December 8, 2004 Share Posted December 8, 2004 Hmm, so how would this be called?$Results = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid Endif<{POST_SNAPBACK}>.. yes ..$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid Endif.. or ..$asResults = RegExp($sLine, $sPattern) If @Error Then ; Something is wrong Else ; Found the pattern ; .. and the hits are in a zero-based array called $asResults Endif Link to comment Share on other sites More sharing options...
Nutster Posted December 8, 2004 Author Share Posted December 8, 2004 .. yes ..$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then ; Did not find the pattern ElseIf @Error = 2 Then ; The pattern was not valid Endif<{POST_SNAPBACK}>Or$asResults = RegExp($sLine, $sPattern) If @Error = 0 Then ; Found the pattern ; .. and the hits are in a zero-based array called $asResults ElseIf @Error = 1 Then ; Did not find the pattern ; $asResults = "" ElseIf @Error = 2 Then ; The pattern was not valid ; $asResults = "" EndifThis can solve the problems with storing back-references when I implement them as well as RegExpReplace. Ok. I will go this way. @Error will indicate whether the search worked or not (or buggered up completely because of a screwed pattern. I think the return in that case should indicate where the problem occured in the pattern. David NuttallNuttall Computer Consulting An Aquarius born during the Age of Aquarius AutoIt allows me to re-invent the wheel so much faster. I'm off to write a wizard, a wonderful wizard of odd... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now