Jump to content

StringRegExp / StringSplit


Recommended Posts

Hello there,

I am a few days now trying to understand this commands. I am reading the help files and examples but still nothing...

I got this command in my code which search in a file for lines

Local $aLines = StringRegExp($sLines, "(?:([^v]+)(?:v+|$))", 3) ; Lines to a zero-based array (blank lines not included).
and then i got a loop to find these lines.

My lines are like this

[22:53:48] "Leandros" Leaving
[22:53:50] "Leandros" Leaving
[22:53:53] "Shertaz" ?
[22:54:01] "Mirtid" any tips my friend?

The first two lines are basicly the same but the only thing changing is the time

[22:53:48] "Leandros" Leaving
[22:53:50] "Leandros" Leaving

there are 2 second difference in these lines and thats why my script wont find them. But they are still the same.

Is it able to make it find these same lines even if they have difference in the time?

Edit: Is it also able to make it avoid some lines starting like this?

[Open Server] .........
Edited by ileandros

I feel nothing.It feels great.

Link to comment
Share on other sites

I'm not very good at this regular expression stuff, but until someone with more know how on this posts, you can play with this?

This is probably what I'd do, I always just use the simple grab everything expression for stuff like this...

#include <Array.au3>
Local $Test[5] = ['[Open Server] .........','[22:53:48] "Leandros" Leaving','[22:53:50] "Leandros" Leaving','[22:53:53] "Shertaz" ?','[22:54:01] "Mirtid" any tips my friend?']
Local $aLines
For $I = 0 To 4
$aLines = StringRegExp($Test[$I], '[(.*):(.*):(.*)] "(.*)" (.*)', 3) ; Lines to a zero-based array (blank lines not included).
_ArrayDisplay($aLines)
Next
Link to comment
Share on other sites

What kind of text are you looking to capture? It might be easier to help knowing what you want.

Apparently they're IRC logs, they're all usually the same, consisting of time of entry, user name and things they said.

Link to comment
Share on other sites

Yes, but what format does he want the text captured? Does he want the array to be "22:54:01", "Mirtid", "any tips my friend?"

the regex for that would be

[([^]]+)] "([^"]+)" (.*)

Or to be a more precise capture:

[([0-9]{2}:[0-9]{2}:[0-9]{2})] "([^"]+)" (.*)

Edited by GPinzone
Gerard J. Pinzonegpinzone AT yahoo.com
Link to comment
Share on other sites

I am not actually writing a file it is just for the example.

I am taking a text that is writen in a window i have and the text inside it is similar as this one i am posting.

Just making it this way so you can understand.

Edit: I just can seem to understand it...

Edited by ileandros

I feel nothing.It feels great.

Link to comment
Share on other sites

Here is a better script to understand it

#include <Array.au3>
#include <File.au3>

Local $sFile =  "[22:53:48] Leandros Leaving"&@CR&"[22:53:48] Leandros Leaving"&@CR&"[22:53:53] Shertaz ?"&@CR&"[22:54:01] Mirtid any tips my friend?"
$sFiles = run("notepad.exe")
Sleep(2000)
Send($sFile)
Sleep(1000)
Opt("WinTitleMatchMode", 2) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase
Local $sWinTitle = "Notepad" ; "WordPad" ;
If WinExists($sWinTitle) Then
    WinActivate($sWinTitle)
    WinWaitActive($sWinTitle)
    Local $sLines = ControlGetText($sWinTitle, "", "") ; 3rd parameter, "", means Active Control.
    Local $aLines = StringRegExp($sLines, "(?:([^v]+)(?:v+|$))", 3) ; Lines to a zero-based array (blank lines not included).
    ;Local $aLines = StringSplit(StringStripCR($sLines), @LF, 2) ; Lines to a zero-based array (blank lines included).
    If IsArray($aLines) Then
        Local $sRes = ""
        ;_ArrayDisplay($aLines)
        For $i = 0 To UBound($aLines) - 1
            For $j = 0 To UBound($aLines) - 1
                If ($i <> $j) And $aLines[$i] == $aLines[$j] Then ; Check if the "$i"'th element exists any where else in the whole array.
                    $sRes &= StringFormat("Line %3d/ is:  ""%s""n", $i + 1, $aLines[$i])
                    ExitLoop
                EndIf
            Next
        Next
        If $sRes = "" Then
            MsgBox(0, "Lines", "There are no lines.", 2)
        Else
            MsgBox(0, "Lines", $sRes)
        EndIf
    Else
        MsgBox(64, "Attention", $sWinTitle & " has no lines present.", 2)
    EndIf
Else
    MsgBox(64, "Attention", $sWinTitle & " window not running.", 2)
EndIf
Edited by ileandros

I feel nothing.It feels great.

Link to comment
Share on other sites

With association with , this example will call lines duplicate, when lines have the same text and have a time difference of three (3) minutes or less.

;#include <Array.au3>
;#include <File.au3>
#include <Date.au3>

Opt("WinTitleMatchMode", 2) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase

Local $sWinTitle = "Notepad" ; "WordPad" ;
Local $sFile = _
        "[22:53:48] Leandros Leaving" & @CRLF & _
        "[22:53:50] Leandros Leaving" & @CRLF & _
        "[22:53:53] Shertaz ?" & @CRLF & _
        "[22:54:01] Mirtid any tips my friend?"
Local $sTemp, $TimeDiff, $Time_i, $Time_j, $TimeDiff
Local $sFiles = Run("notepad.exe")
WinWaitActive($sWinTitle)
;Send($sFile)
ControlSetText($sWinTitle, "", "", $sFile)

If WinExists($sWinTitle) Then
    WinActivate($sWinTitle)
    WinWaitActive($sWinTitle)
    Local $sLines = ControlGetText($sWinTitle, "", "") ; 3rd parameter, "", means Active Control.
    Local $aLines = StringRegExp($sLines, "(?:([^v]+)(?:v+|$))", 3) ; Lines to a zero-based array (blank lines not included).
    ;Local $aLines = StringSplit(StringStripCR($sLines), @LF, 2) ; Lines to a zero-based array (blank lines included).
    If IsArray($aLines) Then
        Local $sRes = ""
        ;_ArrayDisplay($aLines)
        For $i = 0 To UBound($aLines) - 1
            $sTemp = StringRegExpReplace($aLines[$i], "([.*])(.*)", "$2")
            For $j = 0 To UBound($aLines) - 1

                If ($i <> $j) And (StringRegExpReplace($aLines[$j], "([.*])(.*)", "$2") == $sTemp) Then ; Check if the "$i"'th element exists any where else in the whole array.
                    ; --------- Get time difference -------------
                    $Time_i = StringRegExpReplace($aLines[$i], "[(.*)](.*)", "$1")
                    $Time_j = StringRegExpReplace($aLines[$j], "[(.*)](.*)", "$1")
                    $TimeDiff = Abs(_DateDiff("s", "2012/01/01 " & $Time_i, "2012/01/01 " & $Time_j))
                    ;Note If the time difference spans the end of the day, for example, "23:59:59" and "00:00:01", a wrong result, 86398, will occur - not 2 secs.
                    ;ConsoleWrite($Time_i & @LF)
                    ;ConsoleWrite($Time_j & @LF)
                    ;ConsoleWrite($TimeDiff & @LF)
                    ; ---------> End of Get time difference -------------

                    If $TimeDiff <= 180 Then ; 180 = 3 minutes
                        $sRes &= StringFormat("Line %3d/ is:  ""%s""n", $i + 1, $aLines[$i])
                        ExitLoop
                    EndIf
                EndIf
            Next
        Next
        If $sRes = "" Then
            MsgBox(0, "Lines", "There are no lines.", 2)
        Else
            MsgBox(0, "Lines", $sRes)
        EndIf
    Else
        MsgBox(64, "Attention", $sWinTitle & " has no lines present.", 2)
    EndIf
Else
    MsgBox(64, "Attention", $sWinTitle & " window not running.", 2)
EndIf
Link to comment
Share on other sites

With association with , this example will call lines duplicate, when lines have the same text and have a time difference of three (3) minutes or less.

;#include <Array.au3>
;#include <File.au3>
#include <Date.au3>

Opt("WinTitleMatchMode", 2) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase

Local $sWinTitle = "Notepad" ; "WordPad" ;
Local $sFile = _
        "[22:53:48] Leandros Leaving" & @CRLF & _
        "[22:53:50] Leandros Leaving" & @CRLF & _
        "[22:53:53] Shertaz ?" & @CRLF & _
        "[22:54:01] Mirtid any tips my friend?"
Local $sTemp, $TimeDiff, $Time_i, $Time_j, $TimeDiff
Local $sFiles = Run("notepad.exe")
WinWaitActive($sWinTitle)
;Send($sFile)
ControlSetText($sWinTitle, "", "", $sFile)

If WinExists($sWinTitle) Then
    WinActivate($sWinTitle)
    WinWaitActive($sWinTitle)
    Local $sLines = ControlGetText($sWinTitle, "", "") ; 3rd parameter, "", means Active Control.
    Local $aLines = StringRegExp($sLines, "(?:([^v]+)(?:v+|$))", 3) ; Lines to a zero-based array (blank lines not included).
    ;Local $aLines = StringSplit(StringStripCR($sLines), @LF, 2) ; Lines to a zero-based array (blank lines included).
    If IsArray($aLines) Then
        Local $sRes = ""
        ;_ArrayDisplay($aLines)
        For $i = 0 To UBound($aLines) - 1
            $sTemp = StringRegExpReplace($aLines[$i], "([.*])(.*)", "$2")
            For $j = 0 To UBound($aLines) - 1

                If ($i <> $j) And (StringRegExpReplace($aLines[$j], "([.*])(.*)", "$2") == $sTemp) Then ; Check if the "$i"'th element exists any where else in the whole array.
                    ; --------- Get time difference -------------
                    $Time_i = StringRegExpReplace($aLines[$i], "[(.*)](.*)", "$1")
                    $Time_j = StringRegExpReplace($aLines[$j], "[(.*)](.*)", "$1")
                    $TimeDiff = Abs(_DateDiff("s", "2012/01/01 " & $Time_i, "2012/01/01 " & $Time_j))
                    ;Note If the time difference spans the end of the day, for example, "23:59:59" and "00:00:01", a wrong result, 86398, will occur - not 2 secs.
                    ;ConsoleWrite($Time_i & @LF)
                    ;ConsoleWrite($Time_j & @LF)
                    ;ConsoleWrite($TimeDiff & @LF)
                    ; ---------> End of Get time difference -------------

                    If $TimeDiff <= 180 Then ; 180 = 3 minutes
                        $sRes &= StringFormat("Line %3d/ is:  ""%s""n", $i + 1, $aLines[$i])
                        ExitLoop
                    EndIf
                EndIf
            Next
        Next
        If $sRes = "" Then
            MsgBox(0, "Lines", "There are no lines.", 2)
        Else
            MsgBox(0, "Lines", $sRes)
        EndIf
    Else
        MsgBox(64, "Attention", $sWinTitle & " has no lines present.", 2)
    EndIf
Else
    MsgBox(64, "Attention", $sWinTitle & " window not running.", 2)
EndIf

Well man i dont have words... I opened this thread because i didnt want to get you in so much trouble. I dont know how to thank you...

And it will take time for me to understand this code ;)

I feel nothing.It feels great.

Link to comment
Share on other sites

I think i found a bug. I dont know if its ready a bug but something is going wrong here.

Change the Local $sFile to this and try it

Local $sFile = _
        "[02:38:25] <Inser>  hello  " & @CRLF & _
        "[03:00:55] <tsakil> enjoy ] <iVespa>" & @CRLF & _
        "[02:43:25] <Inser>  hello  " & @CRLF & _
        "[03:04:54] <tsakil> ] hello  enjoy ] <iVespa>" & @CRLF & _
        "[22:54:01] <Mirtid> any tips my friend?"

It doesnt find the 1st and the 3rd line because of the time that is more that 180 secs but why it says that the 2ond and 4th lines are duplicated????

Edited by ileandros

I feel nothing.It feels great.

Link to comment
Share on other sites

I think i found a bug. I dont know if its ready a bug but something is going wrong here.

Change the Local $sFile to this and try it

Local $sFile = _
        "[02:38:25] <Inser>  hello  " & @CRLF & _
        "[03:00:55] <tsakil> enjoy ] <iVespa>" & @CRLF & _
        "[02:43:25] <Inser>  hello  " & @CRLF & _
        "[03:04:54] <tsakil> ] hello  enjoy ] <iVespa>" & @CRLF & _
        "[22:54:01] <Mirtid> any tips my friend?"

It doesnt find the 1st and the 3rd line because of the time that is more that 180 secs but why it says that the 2ond and 4th lines are duplicated????

You are correct. There is a mistake in the Reg Exp pattern. By default ".*" is greedy, and the question mark makes ".*?" un-greedy. In the RE patterns I had "([.*])". This will capture the first open square bracket and everything (including any close square brackets) up to and including the last close square bracket in the test string because ".*" is greedy.

If I had correctly used "([.*?])", this will capture the first open square bracket and everything up to and including the first close square bracket that is encountered from left to right in the test string.

At the time of writing the RE pattern, I expected there would be only one pair of square brackets per line. This would make the greediness of "([.*])" irrelevant.

To explain the RE pattern, "(?;)[^v]+)(?:v+|$))", in the StringRegExp($sLines, "(?:)[^v]+)(?:v+|$))", 3) function:-

(?:...) This outside encompassing non-capturing group is not necessary. It represents one line.

Inside that non-capturing group is "([^v]+)(?:v+|$)" . This has two groups, a capture group (this is the text we want), and a non-capturing group containing characters we do not want to capture.

([^v]+) will capture all characters that are not a vertical white-space character - a line-feed (@LF) or carriage return (@CR).

(?:v+|$) At the end of each line is one or more vertical white-space characters, "v+" or, "|", the end of the test string, "$"

Notice this example uses string manipulation functions other than StringRegExpReplace.

Because the first 10 characters on each line will be in the format "[nn:nn:nn]" , StringTrimLeft, and StringMid functions can be used.

;#include <Array.au3>
;#include <File.au3>
#include <Date.au3>

Opt("WinTitleMatchMode", 2) ;1=start, 2=subStr, 3=exact, 4=advanced, -1 to -4=Nocase

Local $sWinTitle = "Notepad" ; "WordPad" ;
#cs
Local $sFile = _
        "[22:53:48] Leandros Leaving" & @CRLF & _
        "[22:53:50] Leandros Leaving" & @CRLF & _
        "[22:53:53] Shertaz ?" & @CRLF & _
        "[22:54:01] Mirtid any tips my friend?"
                #ce
Local $sFile = _
        "[02:38:25] <Inser>  hello  " & @CRLF & _
        "[03:00:55] <tsakil> enjoy ] <iVespa>" & @CRLF & _
        "[02:43:25] <Inser>  hello  " & @CRLF & _
        "[03:04:54] <tsakil> ] hello  enjoy ] <iVespa>" & @CRLF & _
        "[22:54:01] <Mirtid> any tips my friend?"
Local $sTemp, $TimeDiff, $Time_i, $Time_j, $TimeDiff
Local $sFiles = Run("notepad.exe")
WinWaitActive($sWinTitle)
;Send($sFile)
ControlSetText($sWinTitle, "", "", $sFile)

If WinExists($sWinTitle) Then
    WinActivate($sWinTitle)
    WinWaitActive($sWinTitle)
    Local $sLines = ControlGetText($sWinTitle, "", "") ; 3rd parameter, "", means Active Control.
    Local $aLines = StringRegExp($sLines, "(?:([^v]+)(?:v+|$))", 3) ; Lines to a zero-based array (blank lines not included).
    ;Local $aLines = StringSplit(StringStripCR($sLines), @LF, 2) ; Lines to a zero-based array (blank lines included).
    If IsArray($aLines) Then
        Local $sRes = ""
        ;_ArrayDisplay($aLines)
        For $i = 0 To UBound($aLines) - 1
            ;$sTemp = StringRegExpReplace($aLines[$i], "([.*?])(.*)", "$2") ;<- Corrected to be un-greedy (allows for extra "]")
            $sTemp = StringTrimLeft($aLines[$i], 10)
            For $j = 0 To UBound($aLines) - 1

                ;If ($i <> $j) And (StringRegExpReplace($aLines[$j], "([.*?])(.*)", "$2") == $sTemp) Then ; Check if the "$i"'th element exists any where else in the whole array.
                If ($i <> $j) And (StringTrimLeft($aLines[$j], 10) == $sTemp) Then ; Check if the "$i"'th element exists any where else in the whole array.
                    ; --------- Get time difference -------------
                    ;$Time_i = StringRegExpReplace($aLines[$i], "[(.*?)](.*)", "$1") ;<- Corrected to be un-greedy (allows for extra "]")
                    ;$Time_j = StringRegExpReplace($aLines[$j], "[(.*?)](.*)", "$1") ;<- Corrected to be un-greedy (allows for extra "]")
                    $Time_i = StringMid($aLines[$i], 2, 8)
                    $Time_j = StringMid($aLines[$j], 2, 8)
                    $TimeDiff = Abs(_DateDiff("s", "2012/01/01 " & $Time_i, "2012/01/01 " & $Time_j))
                    ;Note If the time difference spans the end of the day, for example, "23:59:59" and "00:00:01", a wrong result, 86398, will occur - not 2 secs.
                    ConsoleWrite($Time_i & @LF)
                    ConsoleWrite($Time_j & @LF)
                    ConsoleWrite($TimeDiff & @LF)
                    ; ---------> End of Get time difference -------------

                    If $TimeDiff <= 180 Then ; 180 = 3 minutes
                        $sRes &= StringFormat("Line %3d/ is:  ""%s""n", $i + 1, $aLines[$i])
                        ExitLoop
                    EndIf
                EndIf
            Next
        Next
        If $sRes = "" Then
            MsgBox(0, "Lines", "There are no lines.", 2)
        Else
            MsgBox(0, "Lines", $sRes)
        EndIf
    Else
        MsgBox(64, "Attention", $sWinTitle & " has no lines present.", 2)
    EndIf
Else
    MsgBox(64, "Attention", $sWinTitle & " window not running.", 2)
EndIf
Link to comment
Share on other sites

Thank you very much for the answer. It was very helpfull belive me. So ur using $sTemp = StringTrimLeft($aLines[$i], 10) which it allows me to use more " ] " and ur using $Time_i = StringMid($aLines[$i], 2, 8) to catch the date right?? This is an easier way as i see and since the writing style inside stays stable it works like a charme. What i want to do ,which ur replaced it at this last post, is this:

Local $sFile = _
        "[02:38:25] <inser>  hello  " &amp; @CRLF &amp; _
        "[03:00:55] <tsakil> enjoy ] <ivespa>" &amp; @CRLF &amp; _
        "[02:43:25] <inser>  hello  " &amp; @CRLF &amp; _
        "[03:04:54] <tsakil> ] hello  enjoy ] <ivespa>" &amp; @CRLF &amp; _
        "[22:54:01] <mirtid> any tips my friend?"

As you can understand the first part represents the Date i write, the second part represents the Name and the third part represents the "Things" i write.

I would easly use StringMid if the names are all the same but they are not.

Before u were using this $Time_i = StringRegExpReplace($aLines[$i], "[(.*?)](.*)", "$1") which didnt allowed me to use more " ] " as i understand.

To be honest i am reading the StringRegExp function and the help files but i cant understand it very much.

How can i catch the name? How can i catch these "<" ">" so i find the name? I wanna test the name if it exists 3 or more times betwen the duplicated names but i dont know how to catch it.. I triend "(? ;)[^v]+)(?:v+|$))" which had no succes.</mirtid></ivespa></tsakil></inser></ivespa></tsakil></inser>

Edited by ileandros

I feel nothing.It feels great.

Link to comment
Share on other sites

.... So ur using $sTemp = StringTrimLeft($aLines[$i], 10) which it allows me to use more " ] "....

No, the StringTrimLeft() function was just an alternative function to use instead of using StringRegExpReplace(). All the commented out command lines in my previously posted script work using an alternate function.

....

Before u were using this $Time_i = StringRegExpReplace($aLines[$i], "\[(.*?)\](.*)", "$1") which didnt allowed me to use more " ] " as i understand.

....

No, before I was using this $Time_i = StringRegExpReplace($aLines[$i], "\[(.*)\](.*)", "$1") which didn't allow the use of more " ] ", because of the missing, all important, question mark.

See AutoIt help file > Under "StringRegExp" function > Scroll down to "Repeating Characters" table > At the bottom of table see "? (after a repeating character)". It is this question mark I have been referring to as making ".*?" un-greedy.

I am not referring to the question mark that is one above that follows a character, set or group.

In fact, the immediate above quote is very wrong because $Time_i = StringRegExpReplace($aLines[$i], "\[(.*?)\](.*)", "$1") allows the use of more " ] ", because the question mark following a repeating character is present.

May clarity replace confusion with regard to regular expressions.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...