Jump to content

Never used StringRegExp before... problems


NELyon
 Share

Recommended Posts

Yes, my problems have now shifted from arrays to Regular Expressions. I'm working on a guestbook for a friend, and didn't want to use an INI to sort the posts. So i am using a text file set up in the strangest way possible. I hope to god someone can help me with this :whistle::lol::)

Ok, so heres the Post.txt file. I know, it looks STRANGE but it's the best way i could think of to sort the data.

...1...
.name.post1
.text.post1 test

...2...
.name.post2
.text.post2 tst

so i'm trying to use a StringRegExp to capture the numbers inbetween the ... and the other .... I have a feeling i definitely messed up when trying to get the middle of the periods.

#Include <array.au3>
$posts = FileRead("Posts.txt")
$format = StringRegExp($Posts, "...[1-9]...", 1)
_ArrayDisplay($format, "title")

I think the problem is a did the RegExp wrong (with the ... in the RegExp) does anyone (Most likely Smoke_N) know how i can retrieve the numbers inbetween the dots? Thanks in advanced.

Link to comment
Share on other sites

I think the problem is a did the RegExp wrong (with the ... in the RegExp) does anyone (Most likely Smoke_N) know how i can retrieve the numbers inbetween the dots? Thanks in advanced.

$format = StringRegExp($Posts, "\.\.\.([1-9])\.\.\.", 3)

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Link to comment
Share on other sites

  • Moderators

if u want exactaly between those dots

$format = StringRegExp($posts, '^\.{3}(\d+)\.{3}$', 3)

He's reading a file... ^ and $ isn't going to work :whistle:

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Edit:

Biiig ooops. As Smoke pointed out to mrRevoked, and as I didn't notice before I hit Add Reply, you are reading from a file, and the ^ and $ characters will match at the beginning and end of the entire string being passed. This means that it won't work with mrRevoked's example. But, I believe that the switch (?m) is supposed to make this work (but in my test so far I haven't figured it out).

And I also added in another very important thing that I forgot. I forgot to mention, for each example, that the contents of () in a regular expression is what's going to be returned from the function.

For example the expression "Hello mister (.*), how are you\?" is going to match "Hello mister Saunders, how are you?" and return "Saunders".

So in each of their examples, where they have the number wrapped with () it means that the return from StringRegExp is only going to be the number, not the whole string.

The very important thing to remember here (which everyone else is fixing for you, but not necessarily explaining) is that . (period/dot) will match any character (except potentially newline). So if you have: "Hello"

It will be matched by the following regular expressions: ".ello", "..llo", "Hel..", ".....", etc. Get the idea? So if you want to match an actual period, you need to escape it, which means put a backslash in front of it. This is what the guys above have done.

To explain their examples in a little more detail (I'm glad I have 3 examples to work with, thanks guys!):

\.\.\.([1-9])\.\.\.

So this one is the least flexible of the three. It will match three dots followed by one number from 1 to 9 (note: no zero), followed by three dots. So this will match the following:

...1...

...2...

...4...

...9...

etc.

But won't match:

...0...

...10...

...01...

It's also important to note, that this one will match the substring within a line of other text. In other words, it will match this line:

"Awesome Possum ...1... Vanilla Frogs?"

\.{3}(\d+)

This one is more a little more liberal than Kurt's. This one starts by matching three dots (The {3} means match exactly 3 of the preceding character, so in this case 3 matches of \.). Then it matches one or more digit characters (0-9). In this case the \d represents the 0-9, so it's exactly the same as [0-9] which is more like Kurt's example. Then the + after it is the one or more repeating character. For example "b+" will match "b", "bb", "ab", "abb", etc. But not "aa". So altogether, this regular expression will match everything that Kurt's matched, plus the following:

...10...

...100...

...9999999...

(and some unexpected ones)

Kaboom...123...Kapow

Hello...10

Hey...999Heythere

Important note with this one, is that he didn't include checking for three dots after the number, which is why it will match my last two examples.

And finally...

^\.{3}(\d+)\.{3}$

This one is the most specific of the three, and probably what I would go with (sorry Kurt, Smoke). This one matches exactly three dots, at the beginning of the string, one or more digit characters, and exactly three dots at the end of the string.

You may notice this one is partially the same as Smoke's (or vice versa). But an important thing with this expression, as I underlined in the last sentence, is that it matches the three dots at the beginning of the string and at the end. This is what the ^ and the $ mean. ^ means match at the beginning, and $ means match at the end. This means that a regular expression like this: "^Hello" will match "Hello world", "Hello dolly", and "Hello you putz!". But not "I said Hello".

And this expression "world$" will match "Hello world", "I will rule the world", and "Goodbye cruel world". But not "The world is mine!"

So this regular expression will match any combination of three dots, then numbers, then three dots alone on one line that you can concoct.

ie:

...1...

...2...

...10...

...99999999999...

And it will not match:

Kaboom...123...Kapow

Hello...10

Hey...999Heythere

Awesome Possum ...1... Vanilla Frogs?

Whew. Hope that helps a little bit. Sometimes I get a little long winded about regular expressions. :whistle:

Although on one last important note: This is hardly a fully detailed explanation about regular expressions, just on the ones shown here.

Edited by Saunders
Link to comment
Share on other sites

Nope. I don't usually have the patience for writing up something like this. :whistle:

Every now and then though I get the urge to explain things.

Edit:

Aha! I figured it out. And I think this is something of a bug...

I expected the expression "(?m)^\.{3}(\d+)\.{3}$" to match "three dots number three dots" alone on a line, but it actually gave me nothing at all (I used D-Generation's example Posts.txt). So then it dawned on me that $ with the (?m) switch literally matches ONLY new line (or line feed). So I tried the adding a \r before the $ in the expression and it worked fine.

I'm wondering if I suggest (via bug report?) that the $ matches carriage return as well as line feed. Otherwise the only way I can see to effectively match the end of a line of text within a multiline string (when you don't know the file's line end combination) is something like "(?m)^line \d+(\r|\n|\r\n|\z)"

...

And in fact, after testing that it doesn't seem to work entirely properly either. Try this.

#Include <array.au3>
$file = FileOpen('lines.txt', 2)
FileWrite($file, 'line 1' & @CR)
FileWrite($file, 'line 2' & @LF)
FileWrite($file, 'line 3' & @CRLF)
FileWrite($file, 'line 4')
FileClose($file)

$lines = FileRead('lines.txt')

$regex1 = StringRegExp($lines, "(?m)^(line \d+)$", 3)
$regex2 = StringRegExp($lines, "(?m)^(line \d+)\r$", 3)
$regex3 = StringRegExp($lines, "(?m)^(line \d+)(?:\r|\n|\r\n|\z)", 3)

_ArrayDisplay($regex1, 'regex1')
_ArrayDisplay($regex2, 'regex2')
_ArrayDisplay($regex3, 'regex3')
Edited by Saunders
Link to comment
Share on other sites

  • Moderators

Nice points Saunders, I don't usually go to deep into them unless I'm actually working on a file where I can see everything I'm dealing with... from your examples... is this what you were referring to?

'(?m)^\.{3}(\d+)\.{3}(?:\r|\n|$)'

?

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

I don't think putting the $ inside the brackets will work as it's not technically a matching character, more like a specification for the regular expression.

But basically, I thought that the $regex1 in my example should include all 4 lines, and we shouldn't have to dick around with like a 4 piece non-capturing group to figure out end-of-line. (Remember (\r|\n) would match @CR or @LF but not @CRLF, and it also won't match end of string if there's no new line. Tha's why I used the \z character. (?:\r|\n|\r\n|\z) should match @CR, @LF, @CRLF, or the end of the string.

I need to look into this a little more, test the expression implementations in some other languages. It's probably default regular expression behaviour and something I'll just have to deal with, but if it's not I'll be posting a bug report.

*Edit: Just realized and wanted to apologize to Degeneration-X for hijacking his thread. Didn't mean to do that. Did you ever get what you were trying to do figured out?

Edited by Saunders
Link to comment
Share on other sites

  • Moderators

Remember (\r|\n) would match @CR or @LF but not @CRLF

Are you sure about that? :whistle:

Anyway, you'll find it matches fine... I tested it before I posted it with all your other concerns including the end of the string '$'.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Are you sure about that? :whistle:

Anyway, you'll find it matches fine... I tested it before I posted it with all your other concerns including the end of the string '$'.

Hmm. I tried it with my script above, and it matches lines 1, 3, and 4 just like my longest expression, so it does match @CRLF. Although I'm not sure why. :P

It still doesn't match line 2 though...

CODE
#Include <array.au3>
$file = FileOpen('lines.txt', 2)
FileWrite($file, 'line 1' & @CR)
FileWrite($file, 'line 2' & @LF)
FileWrite($file, 'line 3' & @CRLF)
FileWrite($file, 'line 4')
FileClose($file)

$lines = FileRead('lines.txt')

$regex1 = StringRegExp($lines, "(?m)^(line \d+)$", 3)
$regex2 = StringRegExp($lines, "(?m)^(line \d+)\r$", 3)
$regex3 = StringRegExp($lines, "(?m)^(line \d+)(?:\r|\n|\r\n|\z)", 3)
$regex4 = StringRegExp($lines, '(?m)^(line \d+)(?:\r|\n|$)', 3)

_ArrayDisplay($regex1, 'regex1')
_ArrayDisplay($regex2, 'regex2')
_ArrayDisplay($regex3, 'regex3')
_ArrayDisplay($regex4, 'regex4')
(Yours is $regex4)

*Edit:

I can get this regex to match line 2, but then it doesn't get 1 and 3.

(?:\r|\n)(line \d+)(?:\r|\n|$)

If I add \A it will match lines 1, 3, and 4 again but I lose line 2 AGAIN! Argh!!

(?:\A|\r|\n)(line \d+)(?:\r|\n|$)

Can you write a regex that will return all four lines?

Edited by Saunders
Link to comment
Share on other sites

  • Moderators

#Include <array.au3>
$file = FileOpen('lines.txt', 2)
FileWrite($file, 'line 1' & @CR)
FileWrite($file, 'line 2' & @LF)
FileWrite($file, 'line 3' & @CRLF)
FileWrite($file, 'line 4')
FileClose($file)

$lines = FileRead('lines.txt')

$regex4 = StringRegExp($lines, '(line \d+)(?:\r|\n|$)', 3)

_ArrayDisplay($regex4, 'regex4')

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...