neogia

Stringregexp Guide

29 posts in this topic

#1 ·  Posted (edited)

Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().

StringRegExp( "test", "pattern" [, flag ] )

"test" = The string to search through for matches.

"pattern" = A string consisting of certain key characters that let the function know PRECISELY what you want to match. No ifs, ands, or buts.. it's a match or it isn't.

flag[optional] = Tells the function if you just want to know if the "pattern" is found, or if you want it to return the first match, or if you want it to return all the matches in the "test" string.

The Very Basics

------------------

As you may have figured out, the "pattern" string is the only difficult part of calling a StringRegExp() (forthwith: SRE). I find it best to think of patterns as telling the function to match a string character by character. There are different ways to find a certain character: If you want to match the string "test", that should be simple enough. You want to tell SRE to first search the string for a "t". If it finds one, then it assumes it has a match, and the rest of the pattern is used to try to prove that what it's found is not a match. So, if the next character is an "e", it could still be a match. Let's say the next letter is an "x". SRE knows immediately that it hasn't found a match because the third character you tell it to look for is an "s".

Example 1

MsgBox(0, "SRE Example 1 Result", StringRegExp("text", 'test'))

In this example, the message box should read "0", which means the pattern "test" was not found in the test string "text". I know this seems pretty simple, but now you know why it wasn't found.

The next way of specifying a pattern is by using a set ("[ ... ]"). You can equate a set to the logic function "OR". Let's use the previous Example. We want to find either the string "test" or the string "text". So, the way I start looking for a pattern is to think like SRE would think: The first character I want to match is "t", then the letter "e", this is the same for both strings we want to match. Now we want to match "s" OR "x", so we can use a set as a substitute: "[sx]" means match an "s" or an "x". Then the last letter is a "t" again.

Example 2

MsgBox(0, "SRE Example 2 Result", StringRegExp("text", 'te[sx]t'))
MsgBox(0, "SRE Example 2 Result", StringRegExp("test", 'te[sx]t'))

These should both provide the result "1", because the pattern should match both "test" and "text".

You can also specify how many times to match each character by using "{number of matches}" or you can specify a range by using "{min, max}". The first example below is redundant, but shows what I mean:

Example 3

MsgBox(0, "SRE Example 3 Result", StringRegExp("text", 't{1}e{1}[sx]{1}t{1}'))
MsgBox(0, "SRE Example 3 Result", StringRegExp("aaaabbbbcccc", 'b{4}'))

The Not-So Basics

--------------------

Right now you're probably thinking "Isn't this just a glorified StringInStr() function?". Well, using a "flag" value of 0, most of the time you're right. But SRE is much more powerful than that. As you use SRE's more and more, you'll find you might know less and less about the type of pattern you are looking for. There are ways to be less and less specific about each character you wish to specify in the pattern. Take, for example, a line from the chat log of a game: "Gnarly Monster hits you for 18 damage." You want to find out how much damage Gnarly Monster hit you for. Well, you can't use StringInStr() because you aren't looking for "18", you're looking for "????", where ? could be any digit.

Here's how I would assemble this pattern. Look at what you do and do not know about what you want to find:

1) You know that it will ALWAYS contain nothing but digits.

2) You know that it will SOMETIMES be 2 characters long.

2a) You know from playing the game that the maximum damage a monster can do is 999.

2b) You know that the minimum damage a monster can do is 0.

3) You know that it will ALWAYS be between 1 and 3 characters long.

4) You know that there are no other digits in the test string.

At this point, I'd like to introduce the FLAG value of "1" and the grouping characters "()". The flag value of "1" means that SRE will not only match your pattern, but also return an array, with each element of the array consisting of a captured "group" of characters. So without veering off course too much, take this example:

Example 4

$asResult = StringRegExp("This is a test example", '(test)', 1)
If @error == 0 Then
    MsgBox(0, "SRE Example 4 Result", $asResult[0])
EndIf
$asResult = StringRegExp("This is a test example", '(te)(st)', 1)
If @error == 0 Then
    MsgBox(0, "SRE Example 4 Result", $asResult[0] & "," & $asResult[1])
EndIf

So, first the pattern must match somewhere in the test string. If it does, then SRE is told to "capture" any groups ("()") and store them in the return array. You can use multiple captures, as demonstrated by the second piece of code in Example 4.

Ok, back to the Gnarly Monster. Now that we know how to "capture" text, let's construct our pattern: Since you know what you're looking for is digits, there are 3 ways to specify "match any digit": "[:digit:]", "[0-9]", and "\d". The first is probably the easiest to understand. There are a few classes (digit, alnum, space, etc. Check the helpfile for a full list) you can use to specify sets of characters, one of them being digit. "[0-9]" just specifies a range of all the digits 0 through 9. "\d" is just a special character that means the same as the first two. There is no difference between the three, and with all SRE's there are usually at least a couple ways to construct any pattern.

So, first we know we want to capture the digits, so indicate that with the opening parentheses "(". Next, we know we want to capture between 1 and 3 characters, all consisting of digits, so our pattern now looks like "([0-9]{1,3}". And finally close it off with the closing parentheses to indicate the end of our group: "([0-9]{1,3})". Let's try it:

Example 5

$asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1)
If @error == 0 Then
    MsgBox(0, "SRE Example 5 Result", $asResult[0])
EndIf

There you go, the message box correctly displays "18".

Next we need to cover non-capturing groups. The way you indicate these groups is by opening the group with "(?:" instead of just "(". Let's say your log says "You deflect 36 of Gnarly Monster's 279 damage." Now if you run Example 5's SRE on this, you'll come up with "36" instead of "279". Now what I like to do here is just determine what's different between the numbers. One that jumps out at me is that the second number is always followed by a space and then the word "damage". We could just modify our previous pattern to be "([0-9]{1,3} damage)", but what if our script is just looking for the amount of damage, without " damage" tacked onto the end of the number? Here's where you can use a non-capturing group to accomplish this.

Example 6

$asResult = StringRegExp("You deflect 36 of Gnarly Monster's 279 damage.", '([0-9]{1,3})(?: damage)', 1)
If @error == 0 Then
    MsgBox(0, "SRE Example 6 Result", $asResult[0])
EndIf

This could get lengthy, but mostly I just wanted to lay out the foundation for how regular expressions work, and mainly how SRE "thinks". A few things to keep in mind:

- Remember to think about the pattern one character at a time

- The StringRegExp() function finds the first character in the pattern, then it's your job to provide enough

evidence to "prove" whether or not it truly is a match. Example 6 is a good display of this.

- Remember [ ... ] means OR ([xyz] match an "x", a "y", OR a "z")

If you have any other questions, consult the help file first! It explains in detail all of the nitty gritty syntax that comes along with SRE's. One thing to look at in particular is the section on "Repeating Characters". It can make your pattern more readible by substituting certain characters for ranges. For example: "*" is equivalent to {0,} or the range from 0 to any number of characters.

Good luck, Regular Expressions can greatly decrease the length of your code, and make it easier to modify later. Corrections and feedback are welcome!

Resources

------------

Wikipedia Article - Regular Expressions - Thanks blindwig.

StringRegExpGUI.au3 (GUI for testing various StringRegExp() patterns) - Thanks steve8tch. Credit: w0uter

Edited by neogia

[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites



This is GREAT... neogia, thanks for sharing.


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

This is GREAT... neogia, thanks for sharing.

HI,

I agree! Thank you very much!

I have to play around with it.

So long,

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().

StringRegExp( "test", "pattern" [, flag ] )

"test" = The string to search through for matches.

"pattern" = A string consisting of certain key characters that let the function know PRECISELY what you want to match. No ifs, ands, or buts.. it's a match or it isn't.

flag[optional] = Tells the function if you just want to know if the "pattern" is found, or if you want it to return the first match, or if you want it to return all the matches in the "test" string.

Nice job. Thanks. :)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Just an FYI to anyone who is new to regular expressions - these are not a new concept, nor are they special to AutoIt. Anyone who has experience with Unix/Posix, Perl, or Tcl, for example, would be familiar with regular expressions.

Here's the wikipedia page for regular expressions:

http://en.wikipedia.org/wiki/Regular_expression

The point is, if you learn regular expressions, that knowlede will be useful to you beyond just AutoIt.

Share this post


Link to post
Share on other sites

awesome. i think i'll give SRE another try now. i've put them off a few times. thanks for the effort putting this together


1100111 00001011101111 00011101101111 00010111100100 00001111110100 00110111110010 00101101111001 0011100i didn't make up this form of encryption, but i like it.credit to the lvl 6 challenge on arcanum.co.nz

Share this post


Link to post
Share on other sites

you think this will make it into the help file? :)

Share this post


Link to post
Share on other sites

I have attached a utilility called StrRegExpGUI.au3.

It was originally written by @w0uter and modified.

I use it all the time to makeup patterns and quickly test and retest to see if the pattern is going to do what I want.

A number of people have already downloaded it.

I would urge poeple who are learning this function to use it.

StrRegExpGUI.au3

Share this post


Link to post
Share on other sites

Here's a smallish guide on unravelling the seeming mysteries of StringRegExp().

Great Job neogia!!!!!!

One thing I would add is a mention of the importance in escaping special characters in the pattern with a backslash as mentioned in the help file. This is often a reason for matches failing. :)

Share this post


Link to post
Share on other sites

Hi neogia,

Seems to me you know your way around RegExp's in AutoIt, so I seek your advice.

I have posted this in the forum, but there seems to be no takers. I have used ^ and $ quite a lot in vim, sed and awk but cant get it to work in AutoIt. The definitely work if you pass a single line to the StringRegExp function. But if you pass several lines (@CRLF separated) ^ and $ does not seem to work.

Do you have any comments on this? Spot any obvious mistake in my code?

I'm having trouble matching patterns specified to one line. This sample code is rather specific and I can extract the lines by using some of the other words, but that is not the point. I want to understand how I can write my pattern to match something starting on a line and ending on the same line.

#include <Array.au3>
; Sample found in nutsters RegExp_Test_4.au3
if not StringRegExp("Theee  always was work.", "^The*\s+\w+ was \w+[.]$", 0 ) then 
   msgbox(16, "ERROR:","StringRegExp: Faield")
   Exit
endif
; === My code =====================
$data = "This is a expected test line" & @CRLF & _
   "And This is Not a expected line" & @CRLF & _
   "This is the second expected msg" & @CRLF & _
   "This is not wanted"
$regexp ="(^This.*$)"
$arr = StringRegExp($data, $regexp,3)
ConsoleWrite("@error:=" & @error & ", @extended:=" & @extended & " , Ubound($arr):=" & UBound($arr) & @CRLF)
_ArrayDisplay($arr, "MSG")
ConsoleWrite("EXIT")

According to nutster ^ and $ was planed to be documented but I can't find it in the help file v3.1.1.118(beta)

And in this post. ^ and $ should work, but does it when you want to get the matches in an array?

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Thanks for the compliment, I love creating RegExp's. Well, I'm not completely familiar with all other languages' regular expression syntax, but I know that with AutoIt "^" means "only match if this is the first character in the test string" and likewise, "$" means "only match if this is the last character in the test string". I'm afraid it's not supposed to work on a per-line basis. If you do need to do this type of thing, though, I've created a workaround for you, with one discrepency: You list the last line in the test string as "This is not wanted", but I don't see a difference that will be discerned by your test pattern. Is this a typo, or am I missing something? My script recognizes that line, along with the first and third, as matches.

#include <Array.au3>
$data = "This is a expected test line" & @CRLF & _
   "And This is Not a expected line" & @CRLF & _
   "This is the second expected msg" & @CRLF & _
   "This is not wanted"
If StringLeft($data, 2) <> @CRLF Then
    $data = @CRLF & $data
EndIf
If StringRight($data, 2) <> @CRLF Then
    $data = $data & @CRLF
EndIf
$results = ""
$match = StringRegExp($data, '\r\n(This.*?)(\#)\r\n', 1)
While @extended == 1
    If Not IsArray($results) Then
        $results = _ArrayCreate($match[0])
    Else
        _ArrayAdd($results, $match[0])
    EndIf
    $data = StringTrimLeft($data, $match[1])
    $match = StringRegExp($data, '\r\n(This.*?)(\#)\r\n', 1)
WEnd
_ArrayDisplay($results, "")

Also, if anyone else would like help with a certain pain-in-the-butt RegExp, post it here, and I'll make one for you. I'm in the process of lengthening my guide, and would like more examples to include.

Edited by neogia

[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

thanks now i did understand it!....i think....


[font="Verdana"]In work:[list=1][*]InstallIt[*]New version of SpaceWar[/list] [/font]

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

I'm afraid it's not supposed to work on a per-line basis. If you do need to do this type of thing, though, I've created a workaround for you, with one discrepency: You list the last line in the test string as "This is not wanted", but I don't see a difference that will be discerned by your test pattern. Is this a typo, or am I missing something? My script recognizes that line, along with the first and third, as matches.

Very well explained, and your right the last line in my test string is a typo as I was thinking of getting all lines starting with a T and ending with a @CRLF (and probably @LF and @CR).

After I posted I realized that the autoit RegExp functions works different than I expected. My expectations was indeed influenced by my knowledge of sed and awk (who takes what's passed and parses it line by line).

Also, if anyone else would like help with a certain pain-in-the-butt RegExp, post it here, and I'll make one for you. I'm in the process of lengthening my guide, and would like more examples to include.

I have to run now, but I know that there are a few places denoted "sed one liners" and "awk one liners" with lots of straight forward, but sometimes complicated, samples. I think AutoIt would benefit from such a collection of samples. I'll volunteer to participate with you to get such a AutoIt collection as part of your guide, if you like.

Thanks for the sample, explanation and your very nice guide.

EDIT: Did not have a spell checker available when posting.

Edited by Uten

Share this post


Link to post
Share on other sites

Sed one liners

Awk one liners

One line programs

I'm not claiming that all of these one liners apply to tasks done in AutoIt. But I think most of them are good starting points to get a grip on RegExp patterns. Especially if we translate them (many or some) into AutoIt samples.

Obviously it would be a crime not to mention http://www.autoitscript.com/fileman/users/Nutster/

Share this post


Link to post
Share on other sites

#16 ·  Posted

I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is.

I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function.

I used the sample code example:

$asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1)

If @error == 0 Then

MsgBox(0, "SRE Example 5 Result", $asResult[0])

EndIf

I'm using v3.1.0

Can someone tell me what I am missing?

Share this post


Link to post
Share on other sites

#17 ·  Posted

I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is.

I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function.

I used the sample code example:

$asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1)

If @error == 0 Then

MsgBox(0, "SRE Example 5 Result", $asResult[0])

EndIf

I'm using v3.1.0

Can someone tell me what I am missing?

it sounds like you're missing the beta. stringregexp function requires beta. beta has alot more added functionality than that too, and has improved many times over on the production version 3.1

1100111 00001011101111 00011101101111 00010111100100 00001111110100 00110111110010 00101101111001 0011100i didn't make up this form of encryption, but i like it.credit to the lvl 6 challenge on arcanum.co.nz

Share this post


Link to post
Share on other sites

#18 ·  Posted

I feel like a dunce. I've looked over the online documents, the help file, release notes, FAQ, I've searched on this forum for example code, etc - but I can't seem to find out where StringRegExp is.

I ran the sample code that uses the function, from this thread, and AutoIT says "unknown function name". Yet, there seems to be plenty of sample code up here using the function.

I used the sample code example:

$asResult = StringRegExp("Gnarly Monster hits you for 18 damage.", '([0-9]{1,3})', 1)

If @error == 0 Then

MsgBox(0, "SRE Example 5 Result", $asResult[0])

EndIf

I'm using v3.1.0

Can someone tell me what I am missing?

You're missing Beta is all:

AutoIt Beta


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

hi, love this forum. i use expresso to learn and train my regxp strings.

but it is a little differenet then the Autoit regxp engine.

could someone explain what type of syntx autoit 'is more compatible with' regarding other regxp engines?

Edited by yair

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

Here are some more Examples.

I will edit this post and add more as I finish them.

;=====================================================================
; Verify that an e-mail address is properly formatted
; This was modified from a string located on Regular-Exp[b][/b]ressions.info
;=====================================================================
$EmailAdds = "email adds here"
if StringRegExp($EmailAdds, "\<[A-Za-z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\>") Then
    MsgBox(0, "E-Mail", "The E-Mail address entered is Valid")
Else
    MsgBox(0, "E-Mail", "Please enter a Valid e-mail address")
EndIf

;======================================================================
; Search text and return all valid e-mail addressess that are in there
;======================================================================
$Text = FileRead("email.txt")
$EmailFound = StringRegExp($Text, "([A-Za-z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})", 3)
if @extended = 1 Then
    for $i = 0 to UBound($EmailFound) - 1
        MsgBox(0, "E-Mail", $EmailFound[$i])
    Next
Else
    MsgBox(0, "E-Mail", "No E-Mail addressess found in the supplied text")
EndIf

;edit: Here is my current script to format a last name to insert into a database
$LastName = "Last'Name-Here"
MsgBox(0, "Formatted for Database insertion", _StringProper(StringRegExpReplace($LastName, "[^-a-zA-Z0-9]|Jr\.*\>|jr\.*\>", "")))

I would like to make a Proper Case string using the StringRegExpReplace but I can't figure out how to replace a lowercase letter with the exact same letter only in uppercase. Any Ideas?

Mike

Edited by MikeOsdx

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now