Jump to content

Regular Expression Testing


Nutster
 Share

Recommended Posts

.. yes ..

$asResults = RegExp($sLine, $sPattern)
If @Error = 0 Then
; Found the pattern
; .. and the hits are in a zero-based array called $asResults
ElseIf @Error = 1 Then
; Did not find the pattern
ElseIf @Error = 2 Then
; The pattern was not valid
Endif

<{POST_SNAPBACK}>

Or

$asResults = RegExp($sLine, $sPattern)
If @Error = 0 Then
; Found the pattern
; .. and the hits are in a zero-based array called $asResults
ElseIf @Error = 1 Then
; Did not find the pattern
; $asResults = ""
ElseIf @Error = 2 Then
; The pattern was not valid
; $asResults = 7
Endif

This can solve the problems with storing back-references when I implement them as well as RegExpReplace. Ok. I will go this way. @Error will indicate whether the search worked or not (or buggered up completely because of a screwed pattern.) I am thinking that the return value for @Error=2 should indicate where the problem occured in the pattern. In the example, StringMid($sPattern, 7, 1) did not fit a valid pattern.

Edit: Arrg! Mor spelling errors!

Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

  • Replies 138
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

[..] @Error will indicate whether the search worked or not (or buggered up completely because of a screwed pattern.  I think the return in that case should indicate where the problem occured in the pattern.

<{POST_SNAPBACK}>

.. excellent, I like the sounds of that :D ! And thanks for being so open to suggestions, Nutster!

Not sure quite what you mean though: are you saying that in the event of a failed match, the array would hold whatever the pattern WAS able to find? .. How would it indicate where the problem occurred? .. Maybe a 2-element array where index=0 is the position in the test string where the match failed, and index=1 shows the position in the pattern where it failed?

Sorry :lol: - I guess you kindled my imagination there .. :idiot:

Edit: oops .. i see you posted an update after the one I replied to. Nice to see I wasn't too far off ;)

Edited by trids
Link to comment
Share on other sites

.. excellent, I like the sounds of that  :D ! And thanks for being so open to suggestions, Nutster!

Not sure quite what you mean though: are you saying that in the event of a failed match, the array would hold whatever the pattern WAS able to find? .. How would it indicate where the problem occurred? .. Maybe a 2-element array where index=0 is the position in the test string where the match failed, and index=1 shows the position in the pattern where it failed?

Sorry  :lol:  - I guess you kindled my imagination there ..  :idiot:

Edit: oops .. i see you posted an update after the one I replied to. Nice to see I wasn't too far off  ;)

<{POST_SNAPBACK}>

You are welcome! I mean that the return value would be an integer that indicates that the error occured at that character of the pattern string. For example,

$Pattern = "1\23]Junk** Pattern["
$Result = RegExp("Testing.", $Pattern)
If @Error = 0 Then; Yeah right
   MsgBox(0, "Success!", "The pattern matched.")
   DumpArray($Result); contains the matches (but there are none in this pattern, so $Result = "")
ElseIf @Error = 1 Then ;Getting closer
   MsgBox(0, "Failure!", "The pattern was not found."
  ; $Result = ""
ElseIf @Error = 2 then; You better believe it!
   MsgBox(0, "Problem!", "An error occured at character " & $Result & ": " & StringMid($Pattern, $Result, 1))
   $Result only contains the character in the pattern that failed!
EndIf

Hope this makes things clearer. Let's see how much time do I have available this weekend?

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

David, are there any plans to allow searching a string via regular expressions instead of matching a string against a regular expression pattern? For example, given the following string:

sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd

In Lua's regular expression implementation, I can search that string and extract |data1|, |data2|, and |data3| with the following small bit of code:

str = "sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd"
for w in string.gfind(str, "(|%w*|)") do
    print(w)
end

string.gfind is an iterative function, w is the data that matched the pattern. The for loop is terminated when the pattern no longer matches. However, I was thinking more along the lines of AutoIt returning an array since such iterative constructs aren't inherently possible.

Something like this maybe:

$str = "sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd"
$res = RegExpSearch($str, "(|%w*|)")
If Not @error Then
    For $i = 0 To UBound($res)-1
        MsgBox(0, "", $res[$i])
    Next
EndIf

IMO, being able to search a string for a pattern and extract specific chunks of data without having to manually parse the entire string or know its format before-hand is a far more useful use for regular expressions than just doing comparisons to see if a string looks like expected. As it stands, I can find no way to search a string with RegExp() without having some sort of idea in advance what the string will look like.

Link to comment
Share on other sites

Something like this maybe:

$str = "sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd"
$res = RegExpSearch($str, "(|%w*|)")
If Not @error Then
    For $i = 0 To UBound($res)-1
        MsgBox(0, "", $res[$i])
    Next
EndIf

IMO, being able to search a string for a pattern and extract specific chunks of data without having to manually parse the entire string or know its format before-hand is a far more useful use for regular expressions than just doing comparisons to see if a string looks like expected.  As it stands, I can find no way to search a string with RegExp() without having some sort of idea in advance what the string will look like.

<{POST_SNAPBACK}>

I am currently rewriting the RegExp function so it will be close to have you have want. It will be written this way:

$str = "sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd"
$res = RegExp($str, "(\w*)+")
If Not @error Then
    For $i = 0 To UBound($res)-1
        MsgBox(0, "", $res[$i])
    Next
EndIf

I hope to have this done by the weekend. I am running into a bug with comparing what happens after the group. That I mostly have worked out. I will make sure that your pattern is tested.

Right now, \w includes letters and underline (_). Should it include more characters?. \A is alphanumeric characters. Neither set includes pipe (|).

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I think what you have for \w is correct. Glad to hear you're working on making this more of a search like instead of comparison. Its easy to just do a comparison with a searching version just by using ^ at the start.

Link to comment
Share on other sites

Now that I have it more of less working, :idiot: I can say what that pattern line would be.

$str = "sldkjfksdjfl|data1|sldjfskdjflksdjf|data2|ljsdlkjfsdfsdlfs|data3|slkdjfksd"
$res = RegExp($str, "^(\w*)(|\w*)*$")
If Not @error Then
   For $i = 0 To UBound($res)-1
       MsgBox(0, "", $res[$i])
   Next
EndIf

The results would be:

sldkjfksdjfl

|data1

|sldjfskdjflksdjf

|data2

|ljsdlkjfsdfsdlfs

|data3

|slkdjfksd

The pipes on the beginning can be removed using StringMid. Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

The pipes on the beginning can be removed using StringMid.

Why? Just let RegExp do the work :idiot:

RegExp($str, "^(\w*)|(\w*)*$") (untested since I haven't updated to a regexp capable version yet)

Edited by sugi
Link to comment
Share on other sites

Why? Just let RegExp do the work :idiot:

RegExp($str, "^(\w*)|(\w*)*$") (untested since I haven't updated to a regexp capable version yet)

<{POST_SNAPBACK}>

That will only match the first two groups. The pipe is only expected once. I guess you could put another group around it, but I am not sure what order things will come out in.

RegExp($str, "^(\w*)(|(\w*))*$")

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I'm fairly used to regular expressions. What I have to do daily is open a large (85,000+ line) file in vim and run:

:%s/^\("[^"]*\)","/\1 /g

Now it is not difficult to make AutoIt open the file in vim, run the command, write it and quit. It would be nice to be able to wrap up an AutoIt script so that I can give it to a couple of people so they can drag and drop the file on their desktop.

Is this possible, other than doing a RegExp, then StringInStr on each line?

I would rather not have to distribute vim to everyone who could run this.

Link to comment
Share on other sites

I'm fairly used to regular expressions.  What I have to do daily is open a large (85,000+  line) file in vim and run:

:%s/^\("[^"]*\)","/\1 /g

I would rather not have to distribute vim to everyone who could run this.

<{POST_SNAPBACK}>

OK, if I run the above code in VIM, it takes about 10 seconds to complete. Running the following code on the same file took so long I ended up aborting it

MsgBox(1,"Customers","There are " & $lines & "customer files to process",3)
$start = TimerInit()
ProgressOn("Processing", "Processing BIGOLCUS.CSV", "0 percent")
For $i = 1 To $lines
   $line = FileReadLine($rfile,$i)
   $nline = StringReplace($line, '","'," ", 1)
   FileWriteLine($wfile, $nline)

   ProgressSet($i/$lines*100, $i/$lines*100 & " percent")
Next
ProgressSet(100, "Done", "Complete")
Sleep(500)
ProgressOff()
MsgBox(1,"It took",Int(TimerDiff($start)/1000) & "  Seconds To Process " &$lines & " customers")

Also, it slowed down. Progressively. It went through the first ~2000 lines relatively quickly, then just started crawling. Anyone have an idea on speeding this up, or is this just the nature of the beast?

Edited by grakker
Link to comment
Share on other sites

I don't think Nutster has any RegExp replacement functionality ready yet.

But. Search the forums for "minitrue" if you're open to using AutoIt to automate a free, tiny, and powerful commandline utility that can do Regular Expression Replacements.

:idiot:

Link to comment
Share on other sites

I don't think Nutster has any RegExp replacement functionality ready yet.

<{POST_SNAPBACK}>

Nope, but it is on the RegExp to do list that I plan to work on in the new year. I want to get array initializing done first, though. Maybe a debug version after that.

Sorry if this is a little late but shouldn't RegExp be named StringRegExp in keeping with the rest of the string functions?

<{POST_SNAPBACK}>

My original name was that, but it made some of the related commands (no longer present) have huge names. I changed the name and stuck it in the middle of the registry commands in the function list. I do not have a problem if Jon wants to rename it.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

MsgBox(1,"Customers","There are " & $lines & "customer files to process",3)
$start = TimerInit()
ProgressOn("Processing", "Processing BIGOLCUS.CSV", "0 percent")
For $i = 1 To $lines
   $line = FileReadLine($rfile,$i)
   $nline = StringReplace($line, '","'," ", 1)
   FileWriteLine($wfile, $nline)

   ProgressSet($i/$lines*100, $i/$lines*100 & " percent")
Next
ProgressSet(100, "Done", "Complete")
Sleep(500)
ProgressOff()
MsgBox(1,"It took",Int(TimerDiff($start)/1000) & "  Seconds To Process " &$lines & " customers")
Sure it slows down. I'll translate the part you're using to read the file into normal english:

Open file, read line 1. Save line 1 into variable. Close File.

Replace string in variable.

Write variable to file.

Open file, read line 1, read line 2. Save line 2 into variable. Close file.

Replace string in variable.

Write variable to file.

Open file, read line 1, read line 2, read line 3. Save line 3 into variable. Close file.

Replace string in variable.

Write variable to file.

Open file, read line 1, read line 2, read line 3, read line 4. Save line 4 into variable. Close file.

Replace string in variable.

Write variable to file.

...

You see why your code is terribly slow on large files? You're telling AutoIt to read the file severall times but vim only reads the file one time. Use FileOpen, then do a FileReadLine in a loop until you've reached the last line, then FileClose.

Link to comment
Share on other sites

Could you fix my pattern so that the array contains both IPs instead of only the first one?

$str = "How can I extract both the 127.0.0.1 and 10.10.1.2 IP addresses?"
$pattern = "(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"  

$res = RegExp($str,$pattern)
$err = @error

If Not $err Then
  For $i = 0 To UBound($res)-1
      MsgBox(0, "Result", $res[$i])
  Next
EndIf
Use Mozilla | Take a look at My Disorganized AutoIt stuff | Very very old: AutoBuilder 11 Jan 2005 prototype I need to update my sig!
Link to comment
Share on other sites

Could you fix my pattern so that the array contains both IPs instead of only the first one?

$str = "How can I extract both the 127.0.0.1 and 10.10.1.2 IP addresses?"
$pattern = "(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"  

$res = RegExp($str,$pattern)
$err = @error

If Not $err Then
  For $i = 0 To UBound($res)-1
      MsgBox(0, "Result", $res[$i])
  Next
EndIf

<{POST_SNAPBACK}>

Didn't really do that. :lol::idiot: Nothing comes to mind that RegExp will do on it's own, without making another element in the array.

How about: ?

$pattern = "((\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*)+"

I am thinking that I could put an exclamation (!) after a group to tell RegExp to not store the string from that group. What do you think? I know, I am fishing. :D

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

I am thinking that I could put an exclamation (!) after a group to tell RegExp to not store the string from that group.  What do you think?  I know, I am fishing.  :idiot:

Sure, it would be possible. But this will also kill the compatibility to other regex implementations.
Link to comment
Share on other sites

Sure, it would be possible. But this will also kill the compatibility to other regex implementations.

<{POST_SNAPBACK}>

So what do the others do instead? Besides, I have been creating my own original implementation here, so what enhancements we want, we can add. Yes, I am trying to stay as close to other implementations as I can, but we need to make this our own. :idiot:

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...