Jump to content

[RESOLVED] RegEx Resource


Casey
 Share

Recommended Posts

Hello,

I'm back to working on my RegEx skills and trying to come up with something to reformat file directory / file paths that are coming from other applications so that I can feed the resulting list into another script that I came up with. My problem is that I have been testing my RegEx by using https://regex101.com/ with PCRE(PHP). To date it has worked well for me, however, I have hit a stumbling block. In the example below the RegEx on line 158 when applied to pattern "C:\S\*" should pass through the ELSE on line 162 and then get trapped by line 175.

Instead, it flows to 159 and then 188. I'm going cross-eyed and I can't spot the problem in the RegEx, If you have a moment and have experience with this kind of thing, would you please take a quick look and tell me what you think is wrong besides this being probably the ugliest way to go about a simple task.

The task is simply end up with either a full file path or a directory without a trailing backslash. Thanks in advance,

Casey

EDIT - [RESOLVED]  Replaced script block with corrections, though I am not at all happy with how sloppy it is and I am going to rewrite after learning quite a bit in the process. I just didn't want to leave a truly broken bit out here on the site.

#include <Array.au3>
#include <Constants.au3>
#include <MsgBoxConstants.au3>



; Where we have a full path to a file including file extension
Local $aArray[1] = ["C:\S\File.txt"]


_ArrayAdd($aArray, "C:\S\Some File.txt")
_ArrayAdd($aArray, "C:\Windows\winsxs\FileMaps\program_files_business_objects_common_3.5_bin_d4f3c306b49748a7.cdf-ms")
_ArrayAdd($aArray, "C:\S\0_Some File.cdf-ms")

; Where the training backslash is missing from a directory - This is good, DirGetSize format
_ArrayAdd($aArray, "C:\S")


; Where the training backslash exists on a directory - This is bad, DirGetSize won't work
_ArrayAdd($aArray, "C:\S\")

; Where the path ends in an * - This is bad, DirGetSize value less than C:\S
_ArrayAdd($aArray, "C:\S\*")

; Where the path ends in an *.* - This is real bad, DirGetSize won't work and attrib is a laundry list
_ArrayAdd($aArray, "C:\S\*.*")

_ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\*.*")
_ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\*.txt")
_ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\a.txt")
_ArrayAdd($aArray, "C:\Users\A.Problem")
_ArrayAdd($aArray, "C:\Users\A.Problem")


;========================================================================================================================================

; A must as we are working with Winternal's Process Monitor Output and Handles.exe Output
Local $bArray = _ArrayUnique($aArray)

_ArrayColInsert($bArray, 1)
_ArrayColInsert($bArray, 2)

_ArrayDisplay($bArray, "Array @ Beginning")

;========================================================================================================================================

$Cnt = 0
For $i = 0 To UBound($bArray) - 1
    $Cnt = $Cnt + 1
Next
$Cnt = $Cnt - 1

For $i = 1 To $Cnt

    Local $LenRegEx = 0
    Local $LenArrayItem = 0

    ;Test for something that might look like a file extension, greedy, but length of file extension and composition is unknown
    ;Example: C:\Windows\winsxs\FileMaps\program_files_business_objects_common_3.5_bin_d4f3c306b49748a7.cdf-ms

    Global $cArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[\w \\ \.-]+", 2)
    If @error <> 0 Then
        ;_MsgBox("FIRST REGEX FAILED AND SET ERROR" & @CRLF & @CRLF & "$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
        $cArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2)
        If @error <> 0 Then
            Local $LenRegEx = 0
            Local $LenArrayItem = StringLen($bArray[$i][0])
            ;_MsgBox("SECOND REGEX FAILED AND SET ERROR" & @CRLF & @CRLF & "$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
            $bArray[$i][2] = @ScriptLineNumber
        Else
            Local $LenRegEx = StringLen($cArray[0])
            Local $LenArrayItem = StringLen($bArray[$i][0])
            ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
            $bArray[$i][2] = @ScriptLineNumber
        EndIf
    Else
        Local $LenRegEx = StringLen($cArray[0])
        Local $LenArrayItem = StringLen($bArray[$i][0])
        ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
        $bArray[$i][2] = @ScriptLineNumber
    EndIf

    If $LenRegEx == $LenArrayItem Then
        ;_MsgBox("Some are FileGetSize and DirGetSize Compatible " & @CRLF & @CRLF & $bArray[$i][0] & @CRLF & @CRLF & $cArray[0])
        $bArray[$i][2] = @ScriptLineNumber

        $IsThereAnAsterixOrSlash = StringMid($bArray[$i][0], $LenArrayItem, 1)

        ;_MsgBox("$IsThereAnAsterixOrSlash = " & $IsThereAnAsterixOrSlash)
        $bArray[$i][2] = @ScriptLineNumber

        Select
            Case $IsThereAnAsterixOrSlash = "*"

                If StringMid($bArray[$i][0], ($LenArrayItem - 1), 2) = "\*" Then
                    ;_MsgBox("A \* existed " & StringLen($bArray[$i][0]))
                    $bArray[$i][2] = @ScriptLineNumber
                    $bArray[$i][1] = StringTrimRight($bArray[$i][0], 2)
                    ;_ArrayDisplay($bArray, @ScriptLineNumber)
                Else

                    $IsItAsterixDotAsterix = StringMid($bArray[$i][0], $LenArrayItem - 3, 4)
                    ;_MsgBox("$IsItAsterixDotAsterix:   " & $IsItAsterixDotAsterix)
                    $bArray[$i][2] = @ScriptLineNumber

                    If $IsItAsterixDotAsterix = "\*.*" Then
                        ;_MsgBox("\*.* " & @CRLF & @CRLF & $IsItAsterixDotAsterix)
                        ;_MsgBox("StringLen($bArray[$i][0] " & @CRLF & @CRLF & StringLen($bArray[$i][0]) & @CRLF & @CRLF & "StringLen($bArray[$i][0] - 4 " & @CRLF & @CRLF & StringLen($bArray[$i][0]) - 4)
                        $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 4)
                        ;_MsgBox("A * or \ existed " & $bArray[$i][0])
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)

                    ElseIf StringTrimRight($IsItAsterixDotAsterix, 2) = "\*" Then
                        ;_MsgBox("\* " & @CRLF & @CRLF & $IsItAsterixDotAsterix)
                        ;_MsgBox("StringLen($bArray[$i][0] " & @CRLF & @CRLF & StringLen($bArray[$i][0]) & @CRLF & @CRLF & "StringLen($bArray[$i][0] - 4 " & @CRLF & @CRLF & StringLen($bArray[$i][0]) - 2)
                        $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 2)
                        ;_MsgBox("A * or \ existed " & $bArray[$i][0])
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)

                    Else
                        ;_MsgBox("We shouldn't be able to see this so what's wrong?: " & @CRLF & @CRLF & $bArray[$i][0])
                        $bArray[$i][2] = @ScriptLineNumber
                    EndIf
                EndIf

            Case $IsThereAnAsterixOrSlash = "\"
                ;_MsgBox("A * or \ existed " & $bArray[$i][0])
                $bArray[$i][2] = @ScriptLineNumber
                $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 1)
                ;_ArrayDisplay($bArray, @ScriptLineNumber)

            Case Else
                ;_MsgBox("We have a good full file path. " & @CRLF & @CRLF & $bArray[$i][0])
                $bArray[$i][2] = @ScriptLineNumber
                $bArray[$i][1] = $bArray[$i][0]
                ;_ArrayDisplay($bArray, @ScriptLineNumber)
        EndSelect

    Else
        ;===========================================================================================================================
        ;Check if no possible file extension by looking if returned regex length equals array element string length


        ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
        $bArray[$i][2] = @ScriptLineNumber
        ; Could be: C:\S\*

        $dArray = StringRegExp($bArray[$i][0], "^(.+\\\w+)", 2)
        If @error <> 0 Then
            ;_MsgBox("SECOND REGEX FAILED AND SET ERROR")
            $bArray[$i][2] = @ScriptLineNumber
        EndIf

        #cs
            Full String Match On:

            C:\S
            C:\S [return matches source, we have a winner and we already like this guy so we will do nothing to him]

        #ce

        $LenRegEx = StringLen($dArray[0])
        $LenArrayItem = StringLen($bArray[$i][0])

        ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
        $bArray[$i][2] = @ScriptLineNumber

        If $LenRegEx == $LenArrayItem Then

            ;_MsgBox("DIR NO SLASH " & $dArray[0])
            $bArray[$i][2] = @ScriptLineNumber
            $bArray[$i][1] = $bArray[$i][0]
            ;_ArrayDisplay($bArray, @ScriptLineNumber)

        Else

            $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[^\w \\ \.-]+", 2)
            ;$eArray = StringRegExp($bArray[$i][0], "(?x) ^ \w : \\ [\. \(\)\w\\-]+ \\ [^\w\\ \.-]+", 2) [Same results as above] Thank you for the insight JCHD
            If @error <> 0 Then
                ;_MsgBox("THIRD REGEX FAILED AND SET ERROR")
                $bArray[$i][2] = @ScriptLineNumber
                $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[^\w \\ \.-]?", 2) ; C:\S\
                If @error <> 0 Then
                    ;_MsgBox("FOURTH REGEX FAILED AND SET ERROR")
                    $bArray[$i][2] = @ScriptLineNumber
                    $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2) ; C:\S\*.* -FAILING BC LINE 136 "^(.+\\\w+)" HAS PARTIAL MATCH
                    If @error <> 0 Then
                        ;_MsgBox("FIFTH REGEX FAILED AND SET ERROR")
                        $bArray[$i][2] = @ScriptLineNumber
                        Local $LenRegEx = 0
                        Local $LenArrayItem = StringLen($bArray[$i][0])
                        ;_MsgBox("INSANE ->" & $bArray[$i][0] & "<-")
                        $bArray[$i][2] = @ScriptLineNumber
                    Else
                        Local $LenRegEx = StringLen($eArray[0])
                        Local $LenArrayItem = StringLen($bArray[$i][0])
                        ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx)
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)
                    EndIf
                Else
                    Local $LenRegEx = StringLen($eArray[0])
                    Local $LenArrayItem = StringLen($bArray[$i][0])
                    ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx)
                    $bArray[$i][2] = @ScriptLineNumber
                    ;_ArrayDisplay($bArray, @ScriptLineNumber)
                EndIf
            Else
                Local $LenRegEx = StringLen($eArray[0])
                Local $LenArrayItem = StringLen($bArray[$i][0])
                ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx)
                $bArray[$i][2] = @ScriptLineNumber
                ;_ArrayDisplay($bArray, @ScriptLineNumber)
                If $LenRegEx = $LenArrayItem Then
                    $bArray[$i][1] = $bArray[$i][0]
                    $bArray[$i][2] = @ScriptLineNumber
                    ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
                    ;_ArrayDisplay($bArray, @ScriptLineNumber)
                EndIf
            EndIf

            $fArray = StringRegExp($bArray[$i][0], "^\w\:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2)
            If @error <> 0 Then
                $bArray[$i][2] = @ScriptLineNumber
                ;_ArrayDisplay($bArray, @ScriptLineNumber)
            Else
                Local $LenRegEx = StringLen($fArray[0])
                Local $LenArrayItem = StringLen($bArray[$i][0])
                ;_MsgBox($fArray[0] & @CRLF & @CRLF & $LenRegEx)
                $bArray[$i][2] = @ScriptLineNumber
                ;_ArrayDisplay($bArray, @ScriptLineNumber)
                If $LenRegEx = $LenArrayItem Then
                    $bArray[$i][1] = StringTrimRight($bArray[$i][0], 4)
                    ;$bArray[$i][1] = $bArray[$i][0]
                    $bArray[$i][2] = @ScriptLineNumber
                    ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
                    $LenRegEx = 0
                    ;_ArrayDisplay($bArray, @ScriptLineNumber)
                Else
                    $gArray = StringRegExp($bArray[$i][0], "^\w\:\\[\. \(\)\w\\\-]+\\[^ \\ -]+", 2)
                    If @error <> 0 Then
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)
                    Else
                        Local $LenRegEx = StringLen($gArray[0])
                        Local $LenArrayItem = StringLen($bArray[$i][0])
                        ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
                        If $LenRegEx = $LenArrayItem Then
                            $bArray[$i][1] = StringTrimRight($bArray[$i][0], 6)
                            ;$bArray[$i][1] = $bArray[$i][0]
                            $bArray[$i][2] = @ScriptLineNumber
                            ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem)
                            ;_ArrayDisplay($bArray, @ScriptLineNumber)
                        EndIf
                    EndIf
                EndIf
            EndIf

            $IsThereAnAsterixOrSlash = StringMid($bArray[$i][0], $LenArrayItem, 1)

            ;_MsgBox($bArray[$i][0] & @CRLF & @CRLF & "->" & $IsThereAnAsterixOrSlash & "<-")
            $bArray[$i][2] = @ScriptLineNumber

            Select
                Case $LenRegEx == $LenArrayItem

                    #cs
                        Full String Match On:

                        C:\S\
                        C:\S\*
                    #ce
                    If $IsThereAnAsterixOrSlash = "*" Then
                        $bArray[$i][1] = StringTrimRight($bArray[$i][0], 2)
                        ;_MsgBox("An * existed " & $bArray[$i][0])
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)
                    EndIf

                    If $IsThereAnAsterixOrSlash = "\" Then
                        $bArray[$i][1] = StringTrimRight($bArray[$i][0], 1)
                        ;_MsgBox("A \ existed " & $bArray[$i][0])
                        $bArray[$i][2] = @ScriptLineNumber
                        ;_ArrayDisplay($bArray, @ScriptLineNumber)
                    Else
                        ;_MsgBox("OUCH Who Knows")
                        $bArray[$i][2] = @ScriptLineNumber
                    EndIf


                Case Else
                    ;_MsgBox("We shouldn't be able to see this so what's wrong?: " & @CRLF & @CRLF & $bArray[$i][0])
                    $bArray[$i][2] = @ScriptLineNumber

            EndSelect

        EndIf
    EndIf
    ;===========================================================================================================================
Next

$bArray[0][0] = "*** Source Format of Text ***"
$bArray[0][1] = "*** Reformatted Text ***"
$bArray[0][2] = "*** Script Line Exited ***"

_ArrayDisplay($bArray, "Array @ End")





; #FUNCTION# ====================================================================================================================
; CUSTOM MSGBOX - ADDS SCRIPT LINE NUMBER AS TITLE FOR DEBUGGING - ADD MSGBOXES TO SOURCE IF NEEDED

; EXAMPLE MSGBOX
;_MsgBox("The value for $stringinput is not as expected == " & $stringinput)

Func _MsgBox($sText, $sTitle = @ScriptLineNumber)
    MsgBox(0, "-" & $sTitle & "-", $sText)
EndFunc   ;==>_MsgBox
; ===============================================================================================================================

 

Edited by Casey
Link to comment
Share on other sites

Let's see:

#include <Array.au3>

$eArray = StringRegExp("C:\S\*", "(?x) ^ \w : \\ [\. \(\)\w\\-]+ \\ [^\w\\ \.-]+", 2)
_ArrayDisplay($eArray)

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

JCHD,

Thank you and I am very sorry. Like I said I was getting cross eyed and the string that I'm stuck on is "C:\S\" and not "C:\S\*". My RegEx actually works on that one. The problem is that it also works on the other using tester. I'll also say that though it is going to take me a bit to read through your version, it looks a bit more intelligent and I think I'll be able to learn something from it. I spotted an error in trimming a string so I have updated my code above and I also see more problems down the line which is fine and why I am working through this exercise.

Casey

Link to comment
Share on other sites

This is the + quantifier right at the end of the pattern that prevents matching on "C:\S\" by requiring at least one char after the last backslash.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Oh, you on it! I had to add another bit of logic after it failed out on the third RegEx. Added a fourth replacing the + with a ? to catch "C:\S\". I'll update my code in a minute. Now just need to tweak the logic to get rows 8-10 and I'll be done. Thanks a million for catching that.

Casey

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...