Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().
StringRegExp(
"test",
"pattern" [,
flag ] )
"test" = The string to search
through for matches.
"pattern" = A string
consisting of certain key characters that let the function know PRECISELY
what you want to match. No ifs, ands, or buts.. it's a match or it isn't.
flag[optional] = Tells the
function if you just want to know if the "pattern" is found, or if you want it
to return the first match, or if you want it to return all the matches in the
"test" string.
Example 1
MsgBox(0, "SRE Example 1 Result", StringRegExp("text", 'test'))
In this example, the message box should read "0", which means the pattern
"test" was not found in the test string "text". I know this seems pretty
simple, but now you know why it wasn't found.
The next way of specifying a pattern is by using a set ("[ ... ]"). You can
equate a set to the logic function "OR". Let's use the previous Example. We
want to find either the string "test" or the string "text". So, the way I
start looking for a pattern is to think like SRE would think: The first
character I want to match is "t", then the letter "e", this is the same for
both strings we want to match. Now we want to match "s" OR "x", so we can use
a set as a substitute: "[sx]" means match an "s" or an "x". Then the last
letter is a "t" again.
Example 2
MsgBox(0, "SRE Example 2 Result", StringRegExp("text", 'te[sx]t'))
MsgBox(0, "SRE Example 2 Result", StringRegExp("test", 'te[sx]t'))
These should both provide the result "1", because the pattern should match
both "test" and "text".
You can also specify how many times to match each character by using "{number
of matches}" or you can specify a range by using "{min, max}". The first
example below is redundant, but shows what I mean:
Example 3
MsgBox(0, "SRE Example 3 Result", StringRegExp("text", 't{1}e{1}[sx]{1}t{1}'))
MsgBox(0, "SRE Example 3 Result", StringRegExp("aaaabbbbcccc", 'b{4}'))
$asResult = StringRegExp("This is a test example", '(test)', 1)
If @error == 0 Then
MsgBox(0, "SRE Example 4 Result", $asResult[0])
EndIf
$asResult = StringRegExp("This is a test example", '(te)(st)', 1)
If @error == 0 Then
MsgBox(0, "SRE Example 4 Result", $asResult[0] & "," & $asResult[1])
EndIf
So, first the pattern must match somewhere in the test string. If it does,
then SRE is told to "capture" any groups ("()") and store them in the return
array. You can use multiple captures, as demonstrated by the second piece of
code in Example 4.
Ok, back to the log file. Now that we know how to "capture" text, let's
construct our pattern: Since you know what you're looking for is digits, there
are 3 ways to specify "match any digit": "[:digit:]", "[0-9]", and "\d". The
first is probably the easiest to understand. There are a few classes (digit, alnum, space, etc. Check the help file for a full list) you can use to specify
sets of characters, one of them being digit. "[0-9]" just specifies a range of
all the digits 0 through 9. "\d" is just a special character that means the
same as the first two. There is no difference between the three, and with all SRE's there are usually at least a couple ways to construct any pattern.
So, first we know we want to capture the digits, so indicate that with the
opening parentheses "(". Next, we know we want to capture between 1 and 3
characters, all consisting of digits, so our pattern now looks like
"([0-9]{1,3}". And finally close it off with the closing parentheses to
indicate the end of our group: "([0-9]{1,3})". Let's try it:
Example 5
$asResult = StringRegExp("There were 18 sheets left in the ream of paper.",
_
'([0-9]{1,3})', 1)
If @error == 0 Then
MsgBox(0, "SRE Example 5 Result", $asResult[0])
EndIf
There you go, the message box correctly displays "18".
Next we need to cover non-capturing groups. The way you indicate these groups
is by opening the group with "(?:" instead of just "(". Let's say your
log file says "You used 36 of 279 pages." Now if you run Example
5's SRE on this, you'll come up with "36" instead of "279". Now what I like to
do here is just determine what's different between the numbers. One that jumps
out at me is that the second number is always followed by a space and then the
word "pages". We could just modify our previous pattern to be "([0-9]{1,3}
damage)", but what if our script is just looking for the starting amount of
pages, without " pages" tacked onto the end of the number? Here's where you can use
a non-capturing group to accomplish this.
Example 6
$asResult = StringRegExp("You used 36 of 279 pages.",
'([0-9]{1,3})(?: pages)', 1)
If @error == 0 Then
MsgBox(0, "SRE Example 6 Result", $asResult[0])
EndIf
This could get lengthy, but mostly I just wanted to lay out the foundation for
how regular expressions work, and mainly how SRE "thinks". A few things to
keep in mind:
- Remember to think about the pattern one character at a time
- The StringRegExp() function finds the first character in the pattern, then
it's your job to provide enough
evidence to "prove" whether or not it truly is a match. Example 6 is a good
display of this.
- Remember [ ... ] means OR ([xyz] match an "x", a "y", OR a "z")
If you have any other questions, consult the help file first! It explains in
detail all of the nitty gritty syntax that comes along with SRE's. One thing
to look at in particular is the section on "Repeating Characters". It can make
your pattern more readible by substituting certain characters for ranges. For
example: "*" is equivalent to {0,} or the range from 0 to any number of
characters.
Good luck, Regular Expressions can greatly decrease the length of your code,
and make it easier to modify later. Corrections and feedback are welcome!
The 30 Minute Regex Tutorial - by Jim Hollenhorst.
GUI for testing various StringRegExp() patterns -
Thanks steve8tch. Credit: w0uter
Thanks to neogia for this tutorial.