Jump to content

RegExp issue(s)


Recommended Posts

See my script.

Why does the 1st one match and the 2nd one fail?

I only changed " - " to "."

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums - Internet.")
If $Matched Then
   MsgBox(64, "Test 1", "Matched!")
Else
   MsgBox(64, "Test 1", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums.Internet.")
If $Matched Then
   MsgBox(64, "Test 2", "Matched!")
Else
   MsgBox(64, "Test 2", "Sorry. No match.")
EndIf

Func TestRegExp($Str, $Xpr, $Opts = 0)
   StringRegExp($Str, $Xpr, $Opts)
   If NOT @error AND @Extended Then Return 1
   Return 0
EndFunc
Link to comment
Share on other sites

  • Administrators

See my script.

Why does the 1st one match and the 2nd one fail?

I only changed " - " to "."

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums - Internet.")
If $Matched Then
   MsgBox(64, "Test 1", "Matched!")
Else
   MsgBox(64, "Test 1", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums.Internet.")
If $Matched Then
   MsgBox(64, "Test 2", "Matched!")
Else
   MsgBox(64, "Test 2", "Sorry. No match.")
EndIf

Func TestRegExp($Str, $Xpr, $Opts = 0)
   StringRegExp($Str, $Xpr, $Opts)
   If NOT @error AND @Extended Then Return 1
   Return 0
EndFunc

This is like the blind leading the blind, but doesn't the period just match a SINGLE character?

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums . Internet.")

Link to comment
Share on other sites

  • Administrators

This is like the blind leading the blind, but doesn't the period just match a SINGLE character?

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums . Internet.")

This matched as well, I'm assuming after looking at the docs that ".*" will match any number of any characters...

".*Forums.*Internet.*"

Link to comment
Share on other sites

That wasn't clear to me.

What's the difference between ? and .

Edit:

I tried the star character and the match still fails :lmao:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums - Internet.")
If $Matched Then
   MsgBox(64, "Test 1", "Matched!")
Else
   MsgBox(64, "Test 1", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums.Internet.")
If $Matched Then
   MsgBox(64, "Test 2", "Matched!")
Else
   MsgBox(64, "Test 2", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forums . Internet.")
If $Matched Then
   MsgBox(64, "Test 3", "Matched!")
Else
   MsgBox(64, "Test 3", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forum*Internet.")
If $Matched Then
   MsgBox(64, "Test 4", "Matched!")
Else
   MsgBox(64, "Test 4", "Sorry. No match.")
EndIf

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", ".Forum(*)Internet.")
If $Matched Then
   MsgBox(64, "Test 5", "Matched!")
Else
   MsgBox(64, "Test 5", "Sorry. No match.")
EndIf

Func TestRegExp($Str, $Xpr, $Opts = 0)
   StringRegExp($Str, $Xpr, $Opts)
   If NOT @error AND @Extended Then Return 1
   Return 0
EndFunc
Edited by SlimShady
Link to comment
Share on other sites

  • Administrators

One thing I do know about regexps is that if you think of them like filename wildcards with the usual use of * and ? then you get really really confused. I'm going to have to put some effort into learning these I think.

Link to comment
Share on other sites

period " . " matches any single character

".Forums...Internet." will match " Forums - Internet "

or " Forums & Internet "

".Forums.+Internet." will match " Forums - Internet "

or " Forums Internet "

or " Forums le expresion Internet "

P.S.

THIS ONLY WORKS ON UNSTABLE VERSIONS

Edited by normeus
Link to comment
Share on other sites

  • Administrators

? basically means optional.  Its like saying, "It may be there, it may not be there, either way, I don't care, its still a match.

Oh, so . only matches an existing character and .? would match any character even if there no character there?
Link to comment
Share on other sites

I look at it like this:

? = 0 or 1

Example: "abc?"

Matches: "ab" and "abc"

* = 0 or more

Example: "abc*"

Matches: "ab", "abc", "abcc", "abccc", etc and so on.

+ = 1 or more

Example: "abc+"

Matches: "abc", "abcc", "abccc", etc and so on.

So to take Shady's regex's and explain them:

".Forums - Internet."

This will match "Forums - Internet" exactly, with ONE wild character (the period) on either end.

ex: "AForums - InternetB", "xForums - Internet;"

".Forums.Internet."

Will match ONE wild character, then "Forums" then another ONE wild character, then "Internet" then one last wild character.

ex: "aForumsxInternetz", "_Forums5Internet>"

".Forums . Internet."

Will match ONE wild character, then "Forums " (note the space) then another ONE wild character, then " Internet" (again, the space) then one last wild character.

ex: "dForums x Internetz", "7Forums P Internet`"

".Forum*Internet."

Hehe.. This will match one wild character on either end, and then 0 or more m's from "Forum".

ex: "dForuInternet%", "=ForumInternet-", "zForummmmInternet["

".Forum(*)Internet."

I'm not sure what happens when you put the 0 or more quantifier in brackets by itself.. I would assume that it throws an error.

Hope that helps a bit. Learning RegEx stuff was a bit of a chore for me, but I've got it pretty well figured out nowadays. If it helps any, I learned most of what I know from experimentation, and reading (and re-reading, and re-reading again) the Pattern Syntax doc in PHP, which you can read here. It seems that both PHP and the AutoIt regex stuff are following similar standards. Hopefully if you give that page a bit of a read, it will help make it easier to understand the regex stuff used here.

Edited by Saunders
Link to comment
Share on other sites

1. I can't get the OR operator to work using the pipe character |

These both fail:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "AutoIt|James")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "(AutoIt)|(James)")

2. I'm trying to create an expression that matches any character.

Like the wildcard character * used in FileDelete and FileFind...

I'm going to list all the expressions I tried, because it fails on me all the time.

Edit:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums[\d?\A?\p?\s?]Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums[\d\A\p\s]?Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums(\d?\A?\p?\s?)Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums(\d\A\p\s)?Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums\d?\A?\p?\s?Internet")
Edited by SlimShady
Link to comment
Share on other sites

No problem.

Hmm, I was just reading the StringRegExp docs, and saw this at the end:

"? (after a repeating character) : Find the smallest match instead of the largest."

If this is what I think it is, then it's something I've wished PHP had for the longest time. It's the kind of thing that's very useful for things like BBCode. Say you want to have it match all the bits of text between BBCode bold tags, like this:

[b]This[/b] is a [b]string[/b]

You want an array that returns "This" and "string" so you're thinking that a regex like this would work:

\[b\](.*)\[/b\]
(note the escaping of the [ and ] characters)

But unfortunately, what this will return is:

This[/b] is a [b]string

Now, if you use this regex:

\[b\](.*?)\[/b\]

And use a flag of 3, it will return an array with "This" and "string" in it.

So you could do something like this:

$s_String = "[b]Stuff[/b] is [b]cool[/b]"
$a_Reg = StringRegExp($s_String, "(\[b\](.*?)\[/b\])", 3)
For $i = 0 to UBound($a_Reg) -1 Step 2
    $s_String = StringReplace($s_String, $a_Reg[$i+1], '<b>' & $a_Reg[$i] & '</b>')
Next

Of course, if/when a RegExpReplace function is implemented, it will be even easier. Functions I've used before do exactly what my example did, but work like:

$s_String = "[b]Stuff[/b] is [b]cool[/b]"
$s_String = ereg_replace($s_String, "\[b\](.*?)\[/b\]", "<b>\\1</b>")
MsgBox(0, '', $s_String)

The \\1 refers to the first group in the pattern.

A good PHP style example is something like this:

$s_String = 'Username: Saunders; Email: krawlie@hotmail.com';
$s_String = ereg_replace($s_String, 'Username: (.*); Email: (.*)', '<a href="mailto:\\2">\\1</a>');
echo $s_String;

And this would give you <a href="mailto:krawlie@hotmail.com">Saunders</a>

... Wow I'm really rambling now, sorry guys.

/me can't wait for a regexp replace function now.

Link to comment
Share on other sites

  • Administrators

... Wow I'm really rambling now, sorry guys.

/me can't wait for a regexp replace function now.

It's the thing I want too, I asked Nutster to do this one next on his todo list :lmao:
Link to comment
Share on other sites

Hey sorry Shady, I didn't see this post when I made mine earlier otherwise I would have replied to it then.

1. I can't get the OR operator to work using the pipe character |

These both fail:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "AutoIt|James")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "(AutoIt)|(James)")
I don't know why. That should work I think. Maybe you should mention this to Nutster.

2. I'm trying to create an expression that matches any character.

Like the wildcard character * used in FileDelete and FileFind...

I'm going to list all the expressions I tried, because it fails on me all the time.

Well, the period (.) matches any character. It is the wildcard in RegExp's. The only difference is that it only matches one character per usage. You have to tell the pattern that you want it to look for more than one wildcard. And you would do so thusly (using your example as a base):

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums.*Internet")

In plain english it's like this, "Okay, I want you to match the word 'Forums' and then I want you to match any character (.), and as many of them as you want (*), could be none, could be one, could be a hundred, it's up to you, but lastly, I want you to match 'Internet'."

In your examples, the most likely to work would be the second one ("Forums[\d\A\p\s]?Internet"), but your fatal flaw is the ?. It tells the pattern to match one or none of the preceding group/character. So if the string had been "AutoIt Forums-Internet Explorer" it should have worked fine, since there would only be one character between your two strings. As you see in my code above, I used the *, that matches none or more. Alternatively, I could have used the +. The + is one or more. That way you can only match if there is definitely something in between the two strings.

Pattern: "Forums.*Internet"

Matches: "ForumsInternet", "Forums - Internet", "Forums hello world how are you today? Internet"

Pattern: "Forums.+Internet"

Matches all of the above except "ForumsInternet" cus it's saying there HAS to be at least one character between the "Forums" and "Internet".

I also noticed you used character sets instead of the period. That's fine, it would have worked, but it wasn't needed. Sets are more for if you want to match limited characters, like just numbers, or just letters. The boards probably use a RegEx to see when people put web addresses (like http://www.google.com) in their posts to make them auto links. Something like this would match a common http address

"http://[^ ]*"

Notice the caret (^)? This is like reversing the set. So what this pattern says is, "Look for http://, then look for as many NOT spaces as you can get"

This makes it autolink anything starting with http:// up until it hits a space. Like, if you'll notice above, the link to Google includes the closing bracket, because I didn't put a space before it.

Wow I'm long winded. Sorry for making my post so text heavy! :lmao:

Link to comment
Share on other sites

1. I can't get the OR operator to work using the pipe character |

These both fail:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "AutoIt|James")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "(AutoIt)|(James)")

2. I'm trying to create an expression that matches any character.

Like the wildcard character * used in FileDelete and FileFind...

I'm going to list all the expressions I tried, because it fails on me all the time.

Edit:

$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums[\d?\A?\p?\s?]Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums[\d\A\p\s]?Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums(\d?\A?\p?\s?)Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums(\d\A\p\s)?Internet")
$Matched = TestRegExp("AutoIt Forums - Internet Explorer", "Forums\d?\A?\p?\s?Internet")

<{POST_SNAPBACK}>

1. Not yet implemented. It is on the TO DO list.

2. Use dot (.) to match any character and for use .* for 0 or more any characters or .+ for 1 or more of any character. Inside a set, the \d patterns are ignored and are read as \ and the character. Your first line is going to match "Forums" then any of "\d?Aps", ignoring the multiple inclusions of \ and ?.

Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I look at it like this:

".Forum(*)Internet."

I'm not sure what happens when you put the 0 or more quantifier in brackets by itself.. I would assume that it throws an error.

<{POST_SNAPBACK}>

I think it would throw an error and return with @Error set to 2 for bad pattern. I will test. The other case would use * as a literal character (not desired).

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

That wasn't clear to me.

What's the difference between ? and .

<{POST_SNAPBACK}>

Dot is match any character, like ? in DOS wildcards

? is optional: The previous character may or may not appear. e.g. "ab?c" will match "ac" or "abc".

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Hmm, I was just reading the StringRegExp docs, and saw this at the end:

"? (after a repeating character) : Find the smallest match instead of the largest."

If this is what I think it is, then it's something I've wished PHP had for the longest time. It's the kind of thing that's very useful for things like BBCode.

<{POST_SNAPBACK}>

That is what it is supposed to do. Have fun with it. At least it was not that hard to implement.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...