Jump to content
newhere2

Search files in folders

Recommended Posts

newhere2

Hi Guys,

I have a challenge. 
I want to search a directory incl. Subfolders for files, where I only know a port of them. 
As I am searching up to 1,5 to 2 million files, of course, it would be great if the search works fast a possible.
As a Result, I would like to see a list which files exist in this directory only. 

My (not successful) attempt was to create two arrays, one for reading all the known parts of the filenames and the second to store the full directory (incl. all files in subfolders, files only).

Now I ended up with this half-working solution. The function was taken from Matt. 

#include <array.au3>
#include <File.au3>

Func _Find ($s, $d = @ScriptDir)
   If StringRight ($d, 1) <> "\" Then $d &= "\"
   Local $h = FileFindFirstFile ($d & "*")
   If $h = -1 Then Return 0
   while 1
      $t = FileFindNextFile ($h)
      If $t = $s Then Return $d & $t
      $t = $d & $t
      If @Error Then Return 0 * FileClose ($h)
      If StringInStr (FileGetAttrib ($t), "D") Then
         $tmp = _Find ($s, $t)
         If $tmp <> "0" Then Return $tmp
         ContinueLoop
      EndIf
   WEnd
   FileClose ($h)
   Return 0
EndFunc ; ==> _Find





$FileList = FileRead("C:\DP\Dropbox\input2.txt")
$TrimmedFileList = StringSplit($FileList, ",")
;If IsArray($TrimmedFileList) Then _ArrayDisplay($TrimmedFileList)

For $i = 1 to $TrimmedFileList[0]
   if ($TrimmedFileList[$i]) = 0 then
   else
   MsgBox (0, "", _Find($TrimmedFileList[$i]))
Next

Issue 1: It only receives the file path, if the full file path is mentioned in input2.txt. How can I modify this function to find parts of a filename?

Issue 2: How can I deal with the "0" result of the search. I would like to handle it like that, If "0" then do nothing, if not null, show me an Msgbox with the path. 

 

Thanks for your help

Share this post


Link to post
Share on other sites
Melba23

newhere2,

Quote

I want to search a directory incl. Subfolders for files, where I only know a port of them. 

_FileListToArrayRec will do this for you - you can define the pattern for the filename pretty loosely. If you need a regex to define the filename pattern then you could always try this Beta version of the function.

Quote

As I am searching up to 1,5 to 2 million files, of course, it would be great if the search works fast a possible

I am afraid that you will have to accept a fairly lengthy execution time - whatever method you use will have to traverse the entire folder tree and one of that size is going to take a while. If you try _FileListToArrayRec - do NOT use the sort parameter as this adds significantly to the time taken to complete the call.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
newhere2

Thanks for your reply.

I already have _FileListToArrayRec in use, to get the results works fine.

But I do not know how to loop between those those arrays...I already tried several available scripts without success.

Share this post


Link to post
Share on other sites
Melba23

newhere2,

Which "arrays"? Can you explain more clearly please.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
newhere2

Hello Melba23,

 

thanks for your quick replies. Much appreciated. 

I am searching for the whole input.txt, seperated via ",". This numbers are part of filenames, where I do not know the fullname. 

Sure, here is the code:

 

excerpt of input.txt

234234,234242342342,234234,234234,234234,234234234234

findFiles.au3

#cs ----------------------------------------------------------------------------

 AutoIt Version: 3.3.14.2
 Author:         myName

 Script Function:
    Template AutoIt script.

#ce ----------------------------------------------------------------------------

; Script Start - Add your code below here

#include <array.au3>
#include <File.au3>

$simpleList = FileRead("C:\DP\Dropbox\input.txt")
$TrimmedList = StringSplit($simpleList, ",")
If IsArray($TrimmedList) Then _ArrayDisplay($TrimmedList)


;For $i =1  to $res[0]
;   Msgbox (64,"Missing Assets in File", $res[$i])
;Next

;Read Contents of folder
$folderstructure = _FileListToArrayRec(@ScriptDir,Default,$FLTAR_FILES,1,Default,2)
If IsArray($folderstructure) Then _ArrayDisplay($folderstructure)

 

I think these script is working. Now am I stuck with comparing those 2 arrays. 

The result should be a list, with either all files missing (so files that are in array TrimmedList but not in folderstructure.

Share this post


Link to post
Share on other sites
Melba23

newhere2,

So if I understand correctly you have an array of these numeric part-filenames and you want to compare this to the list of files within the folder structure so as to find all the files which do NOT match any of the part-names.

We could use a regular expression to match - otherwise we may have to go with a double looped array structure which is likely to be pretty slow - especially given the number of files you say you have to search.

Here are the 2 methods in action:

#include <Array.au3>

; Simulate an array read in from input.txt
Local $aTrimmedList[] = [6, "234234", "345345345345", "456456456", "567567", "678678678", "789789789789"]

; Simulate a file listing
Local $aFolderStructure[] = [4, _
                            "M:\blah234234blah.fil", _          ; Matches
                            "M:\Folder_1\blah356blah.fil", _    ; Does not match
                            "M:\Folder_2\blah567567blah.fil", _ ; Matches
                            "M:\Folder_3\blah763829blah.fil"]   ; Does not match


; Option 1 - Using a RegEx

; Create a pattern of all the possible matches
$sPattern = ""
For $i = 1 To $aTrimmedList[0]
    $sPattern &= $aTrimmedList[$i] & "|"
Next
$sPattern = StringTrimRight($sPattern, 1)
;ConsoleWrite($sPattern & @CRLF)

; Now loop through the file structure and see which files match
$sDeletionIndices = ""
For $i = 1 To $aFolderStructure[0]
    ; If the file contains one of the possible patterns
    If StringRegExp($aFolderStructure[$i], $sPattern) Then
        ; Add the index of the file to the list
        $sDeletionIndices &= $i & ";"
    EndIf
Next
$sDeletionIndices = StringTrimRight($sDeletionIndices, 1)
;ConsoleWrite($sDeletionIndices & @CRLF)

; Now delete the matching files
_ArrayDelete($aFolderStructure, $sDeletionIndices)
; Reset the count
$aFolderStructure[0] = UBound($aFolderStructure) - 1

; And here we have the result - only files which do not match
_ArrayDisplay($aFolderStructure)

; Option 2 - Double loop

; Restore the full file listing
Local $aFolderStructure[] = [4, _
                            "M:\blah234234blah.fil", _          ; Matches
                            "M:\Folder_1\blah356blah.fil", _    ; Does not match
                            "M:\Folder_2\blah567567blah.fil", _ ; Matches
                            "M:\Folder_3\blah763829blah.fil"]   ; Does not match

; Now loop through the file structure and see which files match
$sDeletionIndices = ""
For $i = 1 To $aFolderStructure[0]
    ; If the file contains one of the possible values
    For $j = 1 To $aTrimmedList[0]
        If StringInStr($aFolderStructure[$i], $aTrimmedList[$j]) Then
            ; Add the index of the file to the list
            $sDeletionIndices &= $i & ";"
            ; No point in looking further
            ExitLoop
        EndIf
    Next
Next
$sDeletionIndices = StringTrimRight($sDeletionIndices, 1)
;ConsoleWrite($sDeletionIndices & @CRLF)

; Now delete the matching files
_ArrayDelete($aFolderStructure, $sDeletionIndices)
; Reset the count
$aFolderStructure[0] = UBound($aFolderStructure) - 1

; And here we have the result - only files which do not match
_ArrayDisplay($aFolderStructure)

Please ask if you have any questions.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
Subz

I noticed that the partial names in the input.txt can be a part of larger numbers for example 234234, 234234234234, also noticed duplicates, so thought maybe removing duplicates and then sorting the list by descending value.  Anyway this is what I came up with but would probably go with Melba23 code :)

#include <Array.au3>
#include <File.au3>

Local $sPartialNames = FileRead("C:\DP\Dropbox\input.txt")
;~ Get Unique List of Partial File Names
Local $aPartialNames = StringSplit($sPartialNames, ",")
If IsArray($aPartialNames) Then
    $aPartialNames = _ArrayUnique($aPartialNames, 0, 1)
    ;~ Sort $aPartialNames Descending so we search from large to small for example 11111, 1111, 111, 11, 1
    _ArraySort($aPartialNames, 1, 1)
    _ArrayDisplay($aPartialNames)
EndIf

Local $aFilesNotFound[1], $aFilesFound[1][2], $aFileList, $iFilesFound
;~ Loop through and find Files with Partial File Names
For $i = 1 To $aPartialNames[0]
    $aFileList = _FileListToArrayRec(@ScriptDir, "*" & $aPartialNames[$i] & "*", 1, 1, 0, 2)
    If @error Then
        ;~ No hits add Partial Name to $aFilesNotFound Array
        _ArrayAdd($aFilesNotFound, $aPartialNames[$i])
        ContinueLoop
    EndIf
    ;~ Add a Column to $aFileList
    _ArrayColInsert($aFileList, 1)

    For $j = $aFileList[0][0] To 1 Step - 1
        $iFilesFound = _ArraySearch($aFilesFound, $aFileList[$j][0], 1, 0, 0, 0, 1, 0)
        If @error Then
            ;~ No duplicate File Name was found so continue
            $aFileList[$j][1] = $aPartialNames[$i]
            ContinueLoop
        EndIf
        _ArrayDelete($aFileList, $j)
    Next
    _ArrayDelete($aFileList, 0)
    If UBound($aFileList) - 1 = -1 Then
        _ArrayAdd($aFilesNotFound, $aPartialNames[$i])
        ContinueLoop
    EndIf
    _ArrayAdd($aFilesFound, $aFileList)
Next
$aFilesFound[0][0] = UBound($aFilesFound) - 1
$aFilesNotFound[0] = UBound($aFilesNotFound) - 1
_ArrayDisplay($aFilesFound, "Partial File Names found")
_ArrayDisplay($aFilesNotFound, "Partial File Names not found")

 

Share this post


Link to post
Share on other sites
Melba23

Subz,

I assumed the repeated patterns were just for ease of creating an example of the file format and that the actual values would be discrete, as in my example.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
newhere2

Thanks both of you for your great replies, spending time on finding a solution. I already put in about halb of day and gave up. 

Yes, duplicate entries never happen. I will test and let you know. 

 

Update:

Instead of deleting the matches, I would like to display them. I tried the following:

; Now delete the matching files
;_ArrayDelete($aFolderStructure, $sDeletionIndices)
_ArrayDisplay($aFolderStructure, $sDeletionIndices)


but I still does show all files. 

Edited by newhere2

Share this post


Link to post
Share on other sites
Melba23

newhere2,

If you want to keep the matches, then you need to adjust the loop logic - like this:

#include <Array.au3>

; Simulate an array read in from input.txt
Local $aTrimmedList[] = [6, "234234", "345345345345", "456456456", "567567", "678678678", "789789789789"]

; Simulate a file listing
Local $aFolderStructure[] = [4, _
                            "M:\blah234234blah.fil", _          ; Matches
                            "M:\Folder_1\blah356blah.fil", _    ; Does not match
                            "M:\Folder_2\blah567567blah.fil", _ ; Matches
                            "M:\Folder_3\blah763829blah.fil"]   ; Does not match


; Option 1 - Using a RegEx

; Create a pattern of all the possible matches
$sPattern = ""
For $i = 1 To $aTrimmedList[0]
    $sPattern &= $aTrimmedList[$i] & "|"
Next
$sPattern = StringTrimRight($sPattern, 1)
;ConsoleWrite($sPattern & @CRLF)

; Now loop through the file structure and see which files match
$sDeletionIndices = ""
For $i = 1 To $aFolderStructure[0]
    ; If the file doe snot contain one of the possible patterns
    If Not StringRegExp($aFolderStructure[$i], $sPattern) Then
        ; Add the index of the file to the list
        $sDeletionIndices &= $i & ";"
    EndIf
Next
$sDeletionIndices = StringTrimRight($sDeletionIndices, 1)
;ConsoleWrite($sDeletionIndices & @CRLF)

; Now delete the non-matching files
_ArrayDelete($aFolderStructure, $sDeletionIndices)
; Reset the count
$aFolderStructure[0] = UBound($aFolderStructure) - 1

; And here we have the result - only files which match
_ArrayDisplay($aFolderStructure)

; Option 2 - Double loop

; Restore the full file listing
Local $aFolderStructure[] = [4, _
                            "M:\blah234234blah.fil", _          ; Matches
                            "M:\Folder_1\blah356blah.fil", _    ; Does not match
                            "M:\Folder_2\blah567567blah.fil", _ ; Matches
                            "M:\Folder_3\blah763829blah.fil"]   ; Does not match

; Now loop through the file structure and see which files match
$sDeletionIndices = ""
For $i = 1 To $aFolderStructure[0]
    ; If the file contains one of the possible values
    For $j = 1 To $aTrimmedList[0]
        If StringInStr($aFolderStructure[$i], $aTrimmedList[$j]) Then
            ; It matches
            ExitLoop
        EndIf
    Next
    If $j > $aTrimmedList[0] Then
        ; No match so add the index of the file to the list
        $sDeletionIndices &= $i & ";"
    EndIf
Next
$sDeletionIndices = StringTrimRight($sDeletionIndices, 1)
;ConsoleWrite($sDeletionIndices & @CRLF)

; Now delete the non-matching files
_ArrayDelete($aFolderStructure, $sDeletionIndices)
; Reset the count
$aFolderStructure[0] = UBound($aFolderStructure) - 1

; And here we have the result - only files which match
_ArrayDisplay($aFolderStructure)

I am sure you can spot the differences.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×