Casey Posted September 11, 2015 Posted September 11, 2015 (edited) Hello,I'm back to working on my RegEx skills and trying to come up with something to reformat file directory / file paths that are coming from other applications so that I can feed the resulting list into another script that I came up with. My problem is that I have been testing my RegEx by using https://regex101.com/ with PCRE(PHP). To date it has worked well for me, however, I have hit a stumbling block. In the example below the RegEx on line 158 when applied to pattern "C:\S\*" should pass through the ELSE on line 162 and then get trapped by line 175.Instead, it flows to 159 and then 188. I'm going cross-eyed and I can't spot the problem in the RegEx, If you have a moment and have experience with this kind of thing, would you please take a quick look and tell me what you think is wrong besides this being probably the ugliest way to go about a simple task.The task is simply end up with either a full file path or a directory without a trailing backslash. Thanks in advance,CaseyEDIT - [RESOLVED] Replaced script block with corrections, though I am not at all happy with how sloppy it is and I am going to rewrite after learning quite a bit in the process. I just didn't want to leave a truly broken bit out here on the site.expandcollapse popup#include <Array.au3> #include <Constants.au3> #include <MsgBoxConstants.au3> ; Where we have a full path to a file including file extension Local $aArray[1] = ["C:\S\File.txt"] _ArrayAdd($aArray, "C:\S\Some File.txt") _ArrayAdd($aArray, "C:\Windows\winsxs\FileMaps\program_files_business_objects_common_3.5_bin_d4f3c306b49748a7.cdf-ms") _ArrayAdd($aArray, "C:\S\0_Some File.cdf-ms") ; Where the training backslash is missing from a directory - This is good, DirGetSize format _ArrayAdd($aArray, "C:\S") ; Where the training backslash exists on a directory - This is bad, DirGetSize won't work _ArrayAdd($aArray, "C:\S\") ; Where the path ends in an * - This is bad, DirGetSize value less than C:\S _ArrayAdd($aArray, "C:\S\*") ; Where the path ends in an *.* - This is real bad, DirGetSize won't work and attrib is a laundry list _ArrayAdd($aArray, "C:\S\*.*") _ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\*.*") _ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\*.txt") _ArrayAdd($aArray, "C:\Program Files (x86)\No Problem\a.txt") _ArrayAdd($aArray, "C:\Users\A.Problem") _ArrayAdd($aArray, "C:\Users\A.Problem") ;======================================================================================================================================== ; A must as we are working with Winternal's Process Monitor Output and Handles.exe Output Local $bArray = _ArrayUnique($aArray) _ArrayColInsert($bArray, 1) _ArrayColInsert($bArray, 2) _ArrayDisplay($bArray, "Array @ Beginning") ;======================================================================================================================================== $Cnt = 0 For $i = 0 To UBound($bArray) - 1 $Cnt = $Cnt + 1 Next $Cnt = $Cnt - 1 For $i = 1 To $Cnt Local $LenRegEx = 0 Local $LenArrayItem = 0 ;Test for something that might look like a file extension, greedy, but length of file extension and composition is unknown ;Example: C:\Windows\winsxs\FileMaps\program_files_business_objects_common_3.5_bin_d4f3c306b49748a7.cdf-ms Global $cArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[\w \\ \.-]+", 2) If @error <> 0 Then ;_MsgBox("FIRST REGEX FAILED AND SET ERROR" & @CRLF & @CRLF & "$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $cArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2) If @error <> 0 Then Local $LenRegEx = 0 Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("SECOND REGEX FAILED AND SET ERROR" & @CRLF & @CRLF & "$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $bArray[$i][2] = @ScriptLineNumber Else Local $LenRegEx = StringLen($cArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $bArray[$i][2] = @ScriptLineNumber EndIf Else Local $LenRegEx = StringLen($cArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $bArray[$i][2] = @ScriptLineNumber EndIf If $LenRegEx == $LenArrayItem Then ;_MsgBox("Some are FileGetSize and DirGetSize Compatible " & @CRLF & @CRLF & $bArray[$i][0] & @CRLF & @CRLF & $cArray[0]) $bArray[$i][2] = @ScriptLineNumber $IsThereAnAsterixOrSlash = StringMid($bArray[$i][0], $LenArrayItem, 1) ;_MsgBox("$IsThereAnAsterixOrSlash = " & $IsThereAnAsterixOrSlash) $bArray[$i][2] = @ScriptLineNumber Select Case $IsThereAnAsterixOrSlash = "*" If StringMid($bArray[$i][0], ($LenArrayItem - 1), 2) = "\*" Then ;_MsgBox("A \* existed " & StringLen($bArray[$i][0])) $bArray[$i][2] = @ScriptLineNumber $bArray[$i][1] = StringTrimRight($bArray[$i][0], 2) ;_ArrayDisplay($bArray, @ScriptLineNumber) Else $IsItAsterixDotAsterix = StringMid($bArray[$i][0], $LenArrayItem - 3, 4) ;_MsgBox("$IsItAsterixDotAsterix: " & $IsItAsterixDotAsterix) $bArray[$i][2] = @ScriptLineNumber If $IsItAsterixDotAsterix = "\*.*" Then ;_MsgBox("\*.* " & @CRLF & @CRLF & $IsItAsterixDotAsterix) ;_MsgBox("StringLen($bArray[$i][0] " & @CRLF & @CRLF & StringLen($bArray[$i][0]) & @CRLF & @CRLF & "StringLen($bArray[$i][0] - 4 " & @CRLF & @CRLF & StringLen($bArray[$i][0]) - 4) $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 4) ;_MsgBox("A * or \ existed " & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) ElseIf StringTrimRight($IsItAsterixDotAsterix, 2) = "\*" Then ;_MsgBox("\* " & @CRLF & @CRLF & $IsItAsterixDotAsterix) ;_MsgBox("StringLen($bArray[$i][0] " & @CRLF & @CRLF & StringLen($bArray[$i][0]) & @CRLF & @CRLF & "StringLen($bArray[$i][0] - 4 " & @CRLF & @CRLF & StringLen($bArray[$i][0]) - 2) $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 2) ;_MsgBox("A * or \ existed " & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) Else ;_MsgBox("We shouldn't be able to see this so what's wrong?: " & @CRLF & @CRLF & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber EndIf EndIf Case $IsThereAnAsterixOrSlash = "\" ;_MsgBox("A * or \ existed " & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber $bArray[$i][1] = StringTrimRight($bArray[$i][0], StringLen($bArray[$i][0]) - 1) ;_ArrayDisplay($bArray, @ScriptLineNumber) Case Else ;_MsgBox("We have a good full file path. " & @CRLF & @CRLF & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber $bArray[$i][1] = $bArray[$i][0] ;_ArrayDisplay($bArray, @ScriptLineNumber) EndSelect Else ;=========================================================================================================================== ;Check if no possible file extension by looking if returned regex length equals array element string length ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $bArray[$i][2] = @ScriptLineNumber ; Could be: C:\S\* $dArray = StringRegExp($bArray[$i][0], "^(.+\\\w+)", 2) If @error <> 0 Then ;_MsgBox("SECOND REGEX FAILED AND SET ERROR") $bArray[$i][2] = @ScriptLineNumber EndIf #cs Full String Match On: C:\S C:\S [return matches source, we have a winner and we already like this guy so we will do nothing to him] #ce $LenRegEx = StringLen($dArray[0]) $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $bArray[$i][2] = @ScriptLineNumber If $LenRegEx == $LenArrayItem Then ;_MsgBox("DIR NO SLASH " & $dArray[0]) $bArray[$i][2] = @ScriptLineNumber $bArray[$i][1] = $bArray[$i][0] ;_ArrayDisplay($bArray, @ScriptLineNumber) Else $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[^\w \\ \.-]+", 2) ;$eArray = StringRegExp($bArray[$i][0], "(?x) ^ \w : \\ [\. \(\)\w\\-]+ \\ [^\w\\ \.-]+", 2) [Same results as above] Thank you for the insight JCHD If @error <> 0 Then ;_MsgBox("THIRD REGEX FAILED AND SET ERROR") $bArray[$i][2] = @ScriptLineNumber $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \( \) \w \\ \-]+\\[^\w \\ \.-]?", 2) ; C:\S\ If @error <> 0 Then ;_MsgBox("FOURTH REGEX FAILED AND SET ERROR") $bArray[$i][2] = @ScriptLineNumber $eArray = StringRegExp($bArray[$i][0], "^\w:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2) ; C:\S\*.* -FAILING BC LINE 136 "^(.+\\\w+)" HAS PARTIAL MATCH If @error <> 0 Then ;_MsgBox("FIFTH REGEX FAILED AND SET ERROR") $bArray[$i][2] = @ScriptLineNumber Local $LenRegEx = 0 Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("INSANE ->" & $bArray[$i][0] & "<-") $bArray[$i][2] = @ScriptLineNumber Else Local $LenRegEx = StringLen($eArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) EndIf Else Local $LenRegEx = StringLen($eArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) EndIf Else Local $LenRegEx = StringLen($eArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox($eArray[0] & @CRLF & @CRLF & $LenRegEx) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) If $LenRegEx = $LenArrayItem Then $bArray[$i][1] = $bArray[$i][0] $bArray[$i][2] = @ScriptLineNumber ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) ;_ArrayDisplay($bArray, @ScriptLineNumber) EndIf EndIf $fArray = StringRegExp($bArray[$i][0], "^\w\:\\[\. \(\)\w\\\-]+\\[^\w \\ -]+", 2) If @error <> 0 Then $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) Else Local $LenRegEx = StringLen($fArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox($fArray[0] & @CRLF & @CRLF & $LenRegEx) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) If $LenRegEx = $LenArrayItem Then $bArray[$i][1] = StringTrimRight($bArray[$i][0], 4) ;$bArray[$i][1] = $bArray[$i][0] $bArray[$i][2] = @ScriptLineNumber ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) $LenRegEx = 0 ;_ArrayDisplay($bArray, @ScriptLineNumber) Else $gArray = StringRegExp($bArray[$i][0], "^\w\:\\[\. \(\)\w\\\-]+\\[^ \\ -]+", 2) If @error <> 0 Then $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) Else Local $LenRegEx = StringLen($gArray[0]) Local $LenArrayItem = StringLen($bArray[$i][0]) ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) If $LenRegEx = $LenArrayItem Then $bArray[$i][1] = StringTrimRight($bArray[$i][0], 6) ;$bArray[$i][1] = $bArray[$i][0] $bArray[$i][2] = @ScriptLineNumber ;_MsgBox("$LenRegEx = " & @CRLF & @CRLF & $LenRegEx & @CRLF & @CRLF & "$LenArrayItem = " & @CRLF & @CRLF & $LenArrayItem) ;_ArrayDisplay($bArray, @ScriptLineNumber) EndIf EndIf EndIf EndIf $IsThereAnAsterixOrSlash = StringMid($bArray[$i][0], $LenArrayItem, 1) ;_MsgBox($bArray[$i][0] & @CRLF & @CRLF & "->" & $IsThereAnAsterixOrSlash & "<-") $bArray[$i][2] = @ScriptLineNumber Select Case $LenRegEx == $LenArrayItem #cs Full String Match On: C:\S\ C:\S\* #ce If $IsThereAnAsterixOrSlash = "*" Then $bArray[$i][1] = StringTrimRight($bArray[$i][0], 2) ;_MsgBox("An * existed " & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) EndIf If $IsThereAnAsterixOrSlash = "\" Then $bArray[$i][1] = StringTrimRight($bArray[$i][0], 1) ;_MsgBox("A \ existed " & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber ;_ArrayDisplay($bArray, @ScriptLineNumber) Else ;_MsgBox("OUCH Who Knows") $bArray[$i][2] = @ScriptLineNumber EndIf Case Else ;_MsgBox("We shouldn't be able to see this so what's wrong?: " & @CRLF & @CRLF & $bArray[$i][0]) $bArray[$i][2] = @ScriptLineNumber EndSelect EndIf EndIf ;=========================================================================================================================== Next $bArray[0][0] = "*** Source Format of Text ***" $bArray[0][1] = "*** Reformatted Text ***" $bArray[0][2] = "*** Script Line Exited ***" _ArrayDisplay($bArray, "Array @ End") ; #FUNCTION# ==================================================================================================================== ; CUSTOM MSGBOX - ADDS SCRIPT LINE NUMBER AS TITLE FOR DEBUGGING - ADD MSGBOXES TO SOURCE IF NEEDED ; EXAMPLE MSGBOX ;_MsgBox("The value for $stringinput is not as expected == " & $stringinput) Func _MsgBox($sText, $sTitle = @ScriptLineNumber) MsgBox(0, "-" & $sTitle & "-", $sText) EndFunc ;==>_MsgBox ; =============================================================================================================================== Edited September 15, 2015 by Casey
jchd Posted September 11, 2015 Posted September 11, 2015 Let's see:#include <Array.au3> $eArray = StringRegExp("C:\S\*", "(?x) ^ \w : \\ [\. \(\)\w\\-]+ \\ [^\w\\ \.-]+", 2) _ArrayDisplay($eArray) This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Casey Posted September 11, 2015 Author Posted September 11, 2015 JCHD,Thank you and I am very sorry. Like I said I was getting cross eyed and the string that I'm stuck on is "C:\S\" and not "C:\S\*". My RegEx actually works on that one. The problem is that it also works on the other using tester. I'll also say that though it is going to take me a bit to read through your version, it looks a bit more intelligent and I think I'll be able to learn something from it. I spotted an error in trimming a string so I have updated my code above and I also see more problems down the line which is fine and why I am working through this exercise.Casey
jchd Posted September 11, 2015 Posted September 11, 2015 This is the + quantifier right at the end of the pattern that prevents matching on "C:\S\" by requiring at least one char after the last backslash. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Casey Posted September 11, 2015 Author Posted September 11, 2015 Oh, you on it! I had to add another bit of logic after it failed out on the third RegEx. Added a fourth replacing the + with a ? to catch "C:\S\". I'll update my code in a minute. Now just need to tweak the logic to get rows 8-10 and I'll be done. Thanks a million for catching that.Casey
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now