newhere2 Posted April 19, 2018 Share Posted April 19, 2018 Hi Guys, I have a challenge. I want to search a directory incl. Subfolders for files, where I only know a port of them. As I am searching up to 1,5 to 2 million files, of course, it would be great if the search works fast a possible. As a Result, I would like to see a list which files exist in this directory only. My (not successful) attempt was to create two arrays, one for reading all the known parts of the filenames and the second to store the full directory (incl. all files in subfolders, files only). Now I ended up with this half-working solution. The function was taken from Matt. #include <array.au3> #include <File.au3> Func _Find ($s, $d = @ScriptDir) If StringRight ($d, 1) <> "\" Then $d &= "\" Local $h = FileFindFirstFile ($d & "*") If $h = -1 Then Return 0 while 1 $t = FileFindNextFile ($h) If $t = $s Then Return $d & $t $t = $d & $t If @Error Then Return 0 * FileClose ($h) If StringInStr (FileGetAttrib ($t), "D") Then $tmp = _Find ($s, $t) If $tmp <> "0" Then Return $tmp ContinueLoop EndIf WEnd FileClose ($h) Return 0 EndFunc ; ==> _Find $FileList = FileRead("C:\DP\Dropbox\input2.txt") $TrimmedFileList = StringSplit($FileList, ",") ;If IsArray($TrimmedFileList) Then _ArrayDisplay($TrimmedFileList) For $i = 1 to $TrimmedFileList[0] if ($TrimmedFileList[$i]) = 0 then else MsgBox (0, "", _Find($TrimmedFileList[$i])) Next Issue 1: It only receives the file path, if the full file path is mentioned in input2.txt. How can I modify this function to find parts of a filename? Issue 2: How can I deal with the "0" result of the search. I would like to handle it like that, If "0" then do nothing, if not null, show me an Msgbox with the path. Thanks for your help Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted April 19, 2018 Moderators Share Posted April 19, 2018 newhere2, Quote I want to search a directory incl. Subfolders for files, where I only know a port of them. _FileListToArrayRec will do this for you - you can define the pattern for the filename pretty loosely. If you need a regex to define the filename pattern then you could always try this Beta version of the function. Quote As I am searching up to 1,5 to 2 million files, of course, it would be great if the search works fast a possible I am afraid that you will have to accept a fairly lengthy execution time - whatever method you use will have to traverse the entire folder tree and one of that size is going to take a while. If you try _FileListToArrayRec - do NOT use the sort parameter as this adds significantly to the time taken to complete the call. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
newhere2 Posted April 19, 2018 Author Share Posted April 19, 2018 Thanks for your reply. I already have _FileListToArrayRec in use, to get the results works fine. But I do not know how to loop between those those arrays...I already tried several available scripts without success. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted April 19, 2018 Moderators Share Posted April 19, 2018 newhere2, Which "arrays"? Can you explain more clearly please. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
newhere2 Posted April 20, 2018 Author Share Posted April 20, 2018 Hello Melba23, thanks for your quick replies. Much appreciated. I am searching for the whole input.txt, seperated via ",". This numbers are part of filenames, where I do not know the fullname. Sure, here is the code: excerpt of input.txt 234234,234242342342,234234,234234,234234,234234234234 findFiles.au3 #cs ---------------------------------------------------------------------------- AutoIt Version: 3.3.14.2 Author: myName Script Function: Template AutoIt script. #ce ---------------------------------------------------------------------------- ; Script Start - Add your code below here #include <array.au3> #include <File.au3> $simpleList = FileRead("C:\DP\Dropbox\input.txt") $TrimmedList = StringSplit($simpleList, ",") If IsArray($TrimmedList) Then _ArrayDisplay($TrimmedList) ;For $i =1 to $res[0] ; Msgbox (64,"Missing Assets in File", $res[$i]) ;Next ;Read Contents of folder $folderstructure = _FileListToArrayRec(@ScriptDir,Default,$FLTAR_FILES,1,Default,2) If IsArray($folderstructure) Then _ArrayDisplay($folderstructure) I think these script is working. Now am I stuck with comparing those 2 arrays. The result should be a list, with either all files missing (so files that are in array TrimmedList but not in folderstructure. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted April 20, 2018 Moderators Share Posted April 20, 2018 newhere2, So if I understand correctly you have an array of these numeric part-filenames and you want to compare this to the list of files within the folder structure so as to find all the files which do NOT match any of the part-names. We could use a regular expression to match - otherwise we may have to go with a double looped array structure which is likely to be pretty slow - especially given the number of files you say you have to search. Here are the 2 methods in action: expandcollapse popup#include <Array.au3> ; Simulate an array read in from input.txt Local $aTrimmedList[] = [6, "234234", "345345345345", "456456456", "567567", "678678678", "789789789789"] ; Simulate a file listing Local $aFolderStructure[] = [4, _ "M:\blah234234blah.fil", _ ; Matches "M:\Folder_1\blah356blah.fil", _ ; Does not match "M:\Folder_2\blah567567blah.fil", _ ; Matches "M:\Folder_3\blah763829blah.fil"] ; Does not match ; Option 1 - Using a RegEx ; Create a pattern of all the possible matches $sPattern = "" For $i = 1 To $aTrimmedList[0] $sPattern &= $aTrimmedList[$i] & "|" Next $sPattern = StringTrimRight($sPattern, 1) ;ConsoleWrite($sPattern & @CRLF) ; Now loop through the file structure and see which files match $sDeletionIndices = "" For $i = 1 To $aFolderStructure[0] ; If the file contains one of the possible patterns If StringRegExp($aFolderStructure[$i], $sPattern) Then ; Add the index of the file to the list $sDeletionIndices &= $i & ";" EndIf Next $sDeletionIndices = StringTrimRight($sDeletionIndices, 1) ;ConsoleWrite($sDeletionIndices & @CRLF) ; Now delete the matching files _ArrayDelete($aFolderStructure, $sDeletionIndices) ; Reset the count $aFolderStructure[0] = UBound($aFolderStructure) - 1 ; And here we have the result - only files which do not match _ArrayDisplay($aFolderStructure) ; Option 2 - Double loop ; Restore the full file listing Local $aFolderStructure[] = [4, _ "M:\blah234234blah.fil", _ ; Matches "M:\Folder_1\blah356blah.fil", _ ; Does not match "M:\Folder_2\blah567567blah.fil", _ ; Matches "M:\Folder_3\blah763829blah.fil"] ; Does not match ; Now loop through the file structure and see which files match $sDeletionIndices = "" For $i = 1 To $aFolderStructure[0] ; If the file contains one of the possible values For $j = 1 To $aTrimmedList[0] If StringInStr($aFolderStructure[$i], $aTrimmedList[$j]) Then ; Add the index of the file to the list $sDeletionIndices &= $i & ";" ; No point in looking further ExitLoop EndIf Next Next $sDeletionIndices = StringTrimRight($sDeletionIndices, 1) ;ConsoleWrite($sDeletionIndices & @CRLF) ; Now delete the matching files _ArrayDelete($aFolderStructure, $sDeletionIndices) ; Reset the count $aFolderStructure[0] = UBound($aFolderStructure) - 1 ; And here we have the result - only files which do not match _ArrayDisplay($aFolderStructure) Please ask if you have any questions. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Subz Posted April 20, 2018 Share Posted April 20, 2018 I noticed that the partial names in the input.txt can be a part of larger numbers for example 234234, 234234234234, also noticed duplicates, so thought maybe removing duplicates and then sorting the list by descending value. Anyway this is what I came up with but would probably go with Melba23 code expandcollapse popup#include <Array.au3> #include <File.au3> Local $sPartialNames = FileRead("C:\DP\Dropbox\input.txt") ;~ Get Unique List of Partial File Names Local $aPartialNames = StringSplit($sPartialNames, ",") If IsArray($aPartialNames) Then $aPartialNames = _ArrayUnique($aPartialNames, 0, 1) ;~ Sort $aPartialNames Descending so we search from large to small for example 11111, 1111, 111, 11, 1 _ArraySort($aPartialNames, 1, 1) _ArrayDisplay($aPartialNames) EndIf Local $aFilesNotFound[1], $aFilesFound[1][2], $aFileList, $iFilesFound ;~ Loop through and find Files with Partial File Names For $i = 1 To $aPartialNames[0] $aFileList = _FileListToArrayRec(@ScriptDir, "*" & $aPartialNames[$i] & "*", 1, 1, 0, 2) If @error Then ;~ No hits add Partial Name to $aFilesNotFound Array _ArrayAdd($aFilesNotFound, $aPartialNames[$i]) ContinueLoop EndIf ;~ Add a Column to $aFileList _ArrayColInsert($aFileList, 1) For $j = $aFileList[0][0] To 1 Step - 1 $iFilesFound = _ArraySearch($aFilesFound, $aFileList[$j][0], 1, 0, 0, 0, 1, 0) If @error Then ;~ No duplicate File Name was found so continue $aFileList[$j][1] = $aPartialNames[$i] ContinueLoop EndIf _ArrayDelete($aFileList, $j) Next _ArrayDelete($aFileList, 0) If UBound($aFileList) - 1 = -1 Then _ArrayAdd($aFilesNotFound, $aPartialNames[$i]) ContinueLoop EndIf _ArrayAdd($aFilesFound, $aFileList) Next $aFilesFound[0][0] = UBound($aFilesFound) - 1 $aFilesNotFound[0] = UBound($aFilesNotFound) - 1 _ArrayDisplay($aFilesFound, "Partial File Names found") _ArrayDisplay($aFilesNotFound, "Partial File Names not found") Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted April 20, 2018 Moderators Share Posted April 20, 2018 Subz, I assumed the repeated patterns were just for ease of creating an example of the file format and that the actual values would be discrete, as in my example. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
newhere2 Posted April 20, 2018 Author Share Posted April 20, 2018 (edited) Thanks both of you for your great replies, spending time on finding a solution. I already put in about halb of day and gave up. Yes, duplicate entries never happen. I will test and let you know. Update: Instead of deleting the matches, I would like to display them. I tried the following: ; Now delete the matching files ;_ArrayDelete($aFolderStructure, $sDeletionIndices) _ArrayDisplay($aFolderStructure, $sDeletionIndices) but I still does show all files. Edited April 20, 2018 by newhere2 Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted April 20, 2018 Moderators Share Posted April 20, 2018 newhere2, If you want to keep the matches, then you need to adjust the loop logic - like this: expandcollapse popup#include <Array.au3> ; Simulate an array read in from input.txt Local $aTrimmedList[] = [6, "234234", "345345345345", "456456456", "567567", "678678678", "789789789789"] ; Simulate a file listing Local $aFolderStructure[] = [4, _ "M:\blah234234blah.fil", _ ; Matches "M:\Folder_1\blah356blah.fil", _ ; Does not match "M:\Folder_2\blah567567blah.fil", _ ; Matches "M:\Folder_3\blah763829blah.fil"] ; Does not match ; Option 1 - Using a RegEx ; Create a pattern of all the possible matches $sPattern = "" For $i = 1 To $aTrimmedList[0] $sPattern &= $aTrimmedList[$i] & "|" Next $sPattern = StringTrimRight($sPattern, 1) ;ConsoleWrite($sPattern & @CRLF) ; Now loop through the file structure and see which files match $sDeletionIndices = "" For $i = 1 To $aFolderStructure[0] ; If the file doe snot contain one of the possible patterns If Not StringRegExp($aFolderStructure[$i], $sPattern) Then ; Add the index of the file to the list $sDeletionIndices &= $i & ";" EndIf Next $sDeletionIndices = StringTrimRight($sDeletionIndices, 1) ;ConsoleWrite($sDeletionIndices & @CRLF) ; Now delete the non-matching files _ArrayDelete($aFolderStructure, $sDeletionIndices) ; Reset the count $aFolderStructure[0] = UBound($aFolderStructure) - 1 ; And here we have the result - only files which match _ArrayDisplay($aFolderStructure) ; Option 2 - Double loop ; Restore the full file listing Local $aFolderStructure[] = [4, _ "M:\blah234234blah.fil", _ ; Matches "M:\Folder_1\blah356blah.fil", _ ; Does not match "M:\Folder_2\blah567567blah.fil", _ ; Matches "M:\Folder_3\blah763829blah.fil"] ; Does not match ; Now loop through the file structure and see which files match $sDeletionIndices = "" For $i = 1 To $aFolderStructure[0] ; If the file contains one of the possible values For $j = 1 To $aTrimmedList[0] If StringInStr($aFolderStructure[$i], $aTrimmedList[$j]) Then ; It matches ExitLoop EndIf Next If $j > $aTrimmedList[0] Then ; No match so add the index of the file to the list $sDeletionIndices &= $i & ";" EndIf Next $sDeletionIndices = StringTrimRight($sDeletionIndices, 1) ;ConsoleWrite($sDeletionIndices & @CRLF) ; Now delete the non-matching files _ArrayDelete($aFolderStructure, $sDeletionIndices) ; Reset the count $aFolderStructure[0] = UBound($aFolderStructure) - 1 ; And here we have the result - only files which match _ArrayDisplay($aFolderStructure) I am sure you can spot the differences. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now