weaponx

FileListToArray comprehensive comparison

43 posts in this topic

#1 ·  Posted (edited)

All of these scripts were written by me but use methods seen by others. With so many different versions of FileListToArray in the wild I think its nice to see why certain methods work better than others.

-Added results from Sm0ke_N and randallc which forced rearrangement, also note some versions are faster with smaller amounts of files but slower with larger amounts

-Much like college football, the rankings mean very little. Sometimes you will only want a small amount of code to find everything, sometimes you will include randallc's bigass code if you need the search to be very specific (Of course i'm biased toward #1 because I wrote it and it is very small code)

Test #1: 2090 files / 253 folders @ 6.9 GB (*.*)
Test #2: 102,084 files / 8824 folders @ 205 GB (*.*)
Test #3: 102,084 files / 8824 folders @ 205 GB (*.mp3, 6720 found)
Test #4: 102,084 files / 8824 folders @ 205 GB (*.mp3 + *.exe, 9704 found)

#7 - Array based w/ Redimensioning
 

Test #1: 1.9365s
Test #2: DNF
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.2.8.1
    Author: WeaponX
    Script Function: Recursive file search (array based)
    Notes: Redim causes big speed decrease, moreso than StringSplit
#ce ----------------------------------------------------------------------------
#include <array.au3>

$timestamp = TimerInit()
$Array = RecursiveFileSearch("D:\")
MsgBox(0, "", (TimerDiff($timestamp) / 1000) & " seconds") ;1.9365s / 2090 files
;_ArrayDisplay($Array)

Func RecursiveFileSearch($startDir, $depth = 0)
    If $depth = 0 Then Global $RFSarray[1]
    $search = FileFindFirstFile($startDir & "\*.*")
    If @error Then Return
    ;Search through all files and folders in directory
    While 1
        $next = FileFindNextFile($search)
        If @error Then ExitLoop
        ;If folder, recurse
        If StringInStr(FileGetAttrib($startDir & "\" & $next), "D") Then
            RecursiveFileSearch($startDir & "\" & $next, $depth + 1)
        Else
            ;Increase size of array
            ReDim $RFSarray[$RFSarray[0] + 1]
            ;Store filename in array
            $RFSarray[$RFSarray[0]] = $startDir & "\" & $next
            ;Increment file count
            $RFSarray[0] += 1
        EndIf
    WEnd
    FileClose($search)
    If $depth = 0 Then Return $RFSarray
EndFunc   ;==>RecursiveFileSearch

Redim causes slowdown

#6 - Array based using FileSystemObject

Test #1: 0.8403s
Test #2: 47.508s
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.2.8.1
    Author: WeaponX (source: http://www.microsoft.com/technet/scriptcen...4/hey1020.mspx)
    Script Function: Recursive file search using FileSystemObject (array based)
    Notes: -Slowest
#ce ----------------------------------------------------------------------------

#include <array.au3>

$timestamp = TimerInit()
$Array = _AdvancedFileListToArray("D:\temp")
MsgBox(0, "", (TimerDiff($timestamp) / 1000) & " seconds") ;0.8403s / 2090 files

_ArrayDisplay($Array)

Func _AdvancedFileListToArray($sPath, $rootFolder = '')
    ;Run once
    If Not IsDeclared("objFSO") Then
        Global $objFSO = ObjCreate("Scripting.FileSystemObject")
        $AFLTfilecount = DirGetSize($sPath, 1)
        Global $objArray[$AFLTfilecount[1] + 1]
        $rootFolder = $objFSO.GetFolder($sPath)
        ;Store all files in root folder first
        For $objFile In $rootFolder.Files
            $objArray[$objArray[0] + 1] = $sPath & "\" & $objFile.Name
            $objArray[0] += 1
        Next
    EndIf
    ;Loop through all subfolders in root folder
    For $Subfolder In $rootFolder.SubFolders
        $objFolder = $objFSO.GetFolder($Subfolder.Path)
        ;Loop through all files in folder
        For $objFile In $objFolder.Files
            $objArray[$objArray[0] + 1] = $Subfolder.Path & "\" & $objFile.Name
            $objArray[0] += 1
        Next
        _AdvancedFileListToArray($sPath, $Subfolder)
    Next
    Return $objArray
EndFunc   ;==>_AdvancedFileListToArray

-Object communication slowness

-EDIT: Fixed incorrect file count

#5 - String based

Test #1: 0.0902s <- LOWEST
Test #2: 33.6384s
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.2.8.1
    Author: WeaponX
    Script Function: Recursive file search (string based)
    Notes: -Fastest thus far
#ce ----------------------------------------------------------------------------

#include <array.au3>

$timestamp = TimerInit()
$Array = RecursiveFileSearch("D:\temp")
MsgBox(0, "", (TimerDiff($timestamp) / 1000) & " seconds") ;0.0902s / 2090 files
_ArrayDisplay($Array)

Func RecursiveFileSearch($startDir, $depth = 0)
    If $depth = 0 Then Global $RFSstring = ""
    $search = FileFindFirstFile($startDir & "\*.*")
    If @error Then Return
    ;Search through all files and folders in directory
    While 1
        $next = FileFindNextFile($search)
        If @error Then ExitLoop
        ;If folder, recurse
        If StringInStr(FileGetAttrib($startDir & "\" & $next), "D") Then
            RecursiveFileSearch($startDir & "\" & $next, $depth + 1)
        Else
            ;Append filename to master string
            $RFSstring &= $startDir & "\" & $next & "*"
        EndIf
    WEnd
    FileClose($search)
    If $depth = 0 Then Return StringSplit(StringTrimRight($RFSstring, 1), "*")
EndFunc   ;==>RecursiveFileSearch

-Fastest in test 1

#4 - String based w/ Helper function

Test #1: 0.1797s
Test #2: 8.5200s
Test #3: 8.5601s
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.2.8.1
    Author: WeaponX
    Script Function: Recursive file search with helper function (string based)
    Notes: -StringSplit is faster than ReDim
#ce ----------------------------------------------------------------------------

#include <array.au3>

$timestamp = TimerInit()
$Array = FileListToArrayX("D:\temp")
MsgBox(0, "", (TimerDiff($timestamp) / 1000) & " seconds") ;0.1797s / 2090 files
_ArrayDisplay($Array)

Func FileListToArrayX($FLTAXstartDir)
    Local $FLTAXstring = ""
    ;Retrieve array of all folders
    $folderArray = RecursiveFolderSearch($FLTAXstartDir)
    ;Loop through all folders
    For $X = 1 To $folderArray[0]
        $search = FileFindFirstFile($folderArray[$X] & "\*.*")
        If @error Then ContinueLoop
        ;Search through all files and folders in directory
        While 1
            $next = FileFindNextFile($search)
            If @error Then ExitLoop
            ;Skip folders, append to strin of all filenames
            If Not StringInStr(FileGetAttrib($folderArray[$X] & "\" & $next), "D") Then $FLTAXstring &= $folderArray[$X] & "\" & $next & "*"
        WEnd
        FileClose($search)
    Next
    ;Split string into array and return it
    Return StringSplit(StringTrimRight($FLTAXstring, 1), "*")
EndFunc   ;==>FileListToArrayX

Func RecursiveFolderSearch($startDir, $depth = 0)
    If $depth = 0 Then Global $RFSstring = $startDir & "*"
    $search = FileFindFirstFile($startDir & "\*.*")
    If @error Then Return
    ;Search through all files and folders in directory
    While 1
        $next = FileFindNextFile($search)
        If @error Then ExitLoop
        ;If folder, recurse
        If StringInStr(FileGetAttrib($startDir & "\" & $next), "D") Then
            ;Append foldername to string
            $RFSstring &= $startDir & "\" & $next & "*"
            ;Recurse
            RecursiveFolderSearch($startDir & "\" & $next, $depth + 1)
        EndIf
    WEnd
    FileClose($search)
    If $depth = 0 Then Return StringSplit(StringTrimRight($RFSstring, 1), "*")
EndFunc   ;==>RecursiveFolderSearch

-Similar to randallc's method with helper function http://www.autoitscript.com/forum/index.php?showtopic=49396

-Helper function not really helpful since you still have to run FileGetAttrib against every single file

#3 - String based w/ Helper functions (randallc)

Updated times using _FileListToArrayFaster1d

Test #1: 0.0936s
Test #2: 4.3418s <- LOWEST
Test #3: 5.1980s
Test #4: 6.1400s
Average: 3.9434s

http://www.autoitscript.com/forum/index.php?showtopic=49396

#2 - Command line based (dir) (Sm0ke_N)

Test #1: 0.2496s
Test #2: 11.1895s
Test #3: 1.6623s <- LOWEST
Test #4: 2.3218s <- LOWEST
Average: 3.8558s

http://www.autoitscript.com/forum/index.ph...t=0&start=0#

#1 - Array based

Test #1: 0.1063s
Test #2: 5.1020s
Test #3: 5.5785s
Test #4: 5.5986s
Average: 4.0964s
#cs ----------------------------------------------------------------------------
    AutoIt Version: 3.2.10.0
    Author: WeaponX
    Updated: 2/21/08
    Script Function: Recursive file search
    2/21/08 - Added pattern for folder matching, flag for return type
    1/24/08 - Recursion is now optional
    Parameters:
        RFSstartdir: Path to starting folder
        RFSFilepattern: RegEx pattern to match
            "\.(mp3)" - Find all mp3 files - case sensitive (by default)
            "(?i)\.(mp3)" - Find all mp3 files - case insensitive
            "(?-i)\.(mp3|txt)" - Find all mp3 and txt files - case sensitive
        RFSFolderpattern:
            "(Music|Movies)" - Only match folders named Music or Movies - case sensitive (by default)
            "(?i)(Music|Movies)" - Only match folders named Music or Movies - case insensitive
            "(?!(Music|Movies)\:)\b.+" - Match folders NOT named Music or Movies - case sensitive (by default)
    RFSFlag: Specifies what is returned in the array
        0 - Files and folders
        1 - Files only
        2 - Folders only
    RFSrecurse: TRUE = Recursive, FALSE = Non-recursive
    RFSdepth: Internal use only
#ce ----------------------------------------------------------------------------

Func RecursiveFileSearch($RFSstartDir, $RFSFilepattern = ".", $RFSFolderpattern = ".", $RFSFlag = 0, $RFSrecurse = True, $RFSdepth = 0)
    ;Ensure starting folder has a trailing slash
    If StringRight($RFSstartDir, 1) <> "\" Then $RFSstartDir &= "\"
    If $RFSdepth = 0 Then
        ;Get count of all files in subfolders for initial array definition
        $RFSfilecount = DirGetSize($RFSstartDir, 1)
        ;File count + folder count (will be resized when the function returns)
        Global $RFSarray[$RFSfilecount[1] + $RFSfilecount[2] + 1]
    EndIf
    $RFSsearch = FileFindFirstFile($RFSstartDir & "*.*")
    If @error Then Return
    ;Search through all files and folders in directory
    While 1
        $RFSnext = FileFindNextFile($RFSsearch)
        If @error Then ExitLoop
        ;If folder and recurse flag is set and regex matches
        If StringInStr(FileGetAttrib($RFSstartDir & $RFSnext), "D") Then
            If $RFSrecurse And StringRegExp($RFSnext, $RFSFolderpattern, 0) Then
                RecursiveFileSearch($RFSstartDir & $RFSnext, $RFSFilepattern, $RFSFolderpattern, $RFSFlag, $RFSrecurse, $RFSdepth + 1)
                If $RFSFlag <> 1 Then
                    ;Append folder name to array
                    $RFSarray[$RFSarray[0] + 1] = $RFSstartDir & $RFSnext
                    $RFSarray[0] += 1
                EndIf
            EndIf
        ElseIf StringRegExp($RFSnext, $RFSFilepattern, 0) And $RFSFlag <> 2 Then
            ;Append file name to array
            $RFSarray[$RFSarray[0] + 1] = $RFSstartDir & $RFSnext
            $RFSarray[0] += 1
        EndIf
    WEnd
    FileClose($RFSsearch)
    If $RFSdepth = 0 Then
        ReDim $RFSarray[$RFSarray[0] + 1]
        Return $RFSarray
    EndIf
EndFunc   ;==>RecursiveFileSearch

-Same as #5 but uses an array instead of returning StringSplit()

-Fastest in test 2

-EDIT: Fixed incorrect file count

-EDIT: Code updated 2/21/08, removed autoit tags

Conclusions

  • -Redim is very very slow, when using huge amounts of files. #7 never even finished Test #2 after waiting about 3 minutes. The memory usage in the task manager was slowly growing at a rate of 2mb / sec
  • -Ubound is slower than storing the number of elements in index zero
  • -FileSystemObject is very slow but I don't think it has to do with AutoIt
  • -StringRegExp is quite fast
Edited by Melba23
Fixed formatting
1 person likes this

Share this post


Link to post
Share on other sites



Explanations?

Sorry, wanted to see how it looked first.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Hi,

it's good to see a comparison by somebody else, and I would love to see a faster function;

Unfortunately, when I run your scripts, they only appear to be faster because they return only about 75% of the files returned by my script and smokN's; I can't see why yet.. [it may be my error with the parameters, or something on my computer?]

I'll look later.

Best, randall

Edited by randallc

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Hi,

it's good to see a comparison by somebody else, and I would love to see a faster function;

Unfortunately, when I run your scripts, they only appear to be faster because they return only about 75% of the files returned by my script and smokN's; I can't see why yet.. [it may be my error with the parameters, or something on my computer?]

I'll look later.

Best, randall

I verified #1 by putting this at the top:

$RFSfilecount = DirGetSize ("D:\",1)
MsgBox(0,"","Files: " & $RFSfilecount[1] & @CRLF & "Folders: " & $RFSfilecount[2])

$timestamp = TimerInit()
$Array = RecursiveFileSearch("D:\")
MsgBox(0,"",(TimerDiff($timestamp) / 1000) & " seconds" & @CRLF & "# of files: " & $Array[0])

The counts in the first and second boxes should match.

EDIT: For comparison, my function returned 102,109 files as well as Sm0ke_N's

EDIT again: Same count from yours as well

Edited by weaponx

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Using your array function, $Array[0] gave me smaller answer (because it has been set by num files found?), wheras ubound gave me correct answer-1 (because it has been set by DirGetSize ?), )

hi, yes,

I can see it is working for you;

But I don't see why this is a problem on my machine and not yours;

I will have to look further

Best, Randall

Edited by randallc

Share this post


Link to post
Share on other sites

Hi,

Can you seee if I am doing something wrong with parameters?

#include-once
#include<_FileListToArrayNew2m.au3>
#include <array.au3>
$timestamp = TimerInit()
$Array = RecursiveFileSearch(@ScriptDir&"\")
;~ MsgBox(0,"",(TimerDiff($timestamp) / 1000) & " seconds") ;0.1063s / 2090 files
;~ _ArrayDisplay($Array)
ConsoleWrite((TimerDiff($timestamp) / 1000) & " seconds"&@LF) ;0.1063s / 2090 files
ConsoleWrite("$Array[0]="&$Array[0]&@LF) ;0.1063s / 2090 files
ConsoleWrite("$UBound($Array)-1="&UBound($Array)-1&@LF) ;0.1063s / 2090 files
$timestamp = TimerInit()
$Array = _FileListToArray3 (@ScriptDir, "*.*", 1, 1, 1) 
ConsoleWrite((TimerDiff($timestamp) / 1000) & " seconds"&@LF) ;0.1063s / 2090 files
ConsoleWrite("$Array[0]="&$Array[0]&@LF) ;0.1063s / 2090 files
ConsoleWrite("$UBound($Array)-1="&UBound($Array)-1&@LF) ;0.1063s / 2090 files
>Running:(3.2.10.0):C:\Program Files\AutoIt3\autoit3.exe "C:\Programs\SearchEngine\WeaponXRecursiveFileSearch.au3"    
0.662301192922661 seconds
$Array[0]=4471
$UBound($Array)-1=13151
3.21868774998747 seconds
$Array[0]=13152
$UBound($Array)-1=13152
+>14:15:01 AutoIT3.exe ended.rc:0
+>14:15:03 AutoIt3Wrapper Finished
>Exit code: 0    Time: 6.969
Randall

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I see what you mean. I think I messed up something with my increments. I will check it out tomorrow. Still its only one off, i'm not sure why you are getting only 75% of the files.

Edited by weaponx

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I see what you mean. I think I messed up something with my increments. I will check it out tomorrow. Still its only one off, i'm not sure why you are getting only 75% of the files.

Hi,

OK with

RecursiveFileSearch($startDir & "" & $next&"\", $depth + 1)

If it is really twice as fast with arrays as with strings, I'll want to incorporate it in mine; but such a pain for multiple searches [ie multiple filter parameters], where it was so easy just to keep adding to the string, and stringsplit at the very end...

Oh well, maybe you can help me with that coding?

Best, Randall

[Edit] See next post [hopefully!]; Array no faster; the duplication, as you point out, of the checking attributes doubles the time taken;;

Now How to fix it!

[i always knew I was being lazy with this, but , hey. forgive me! - It was better script than anything out therer for retaining foreign characters, multiple options, and speed as combined assessment; lots of scripts can do one thing or the other really well, better...]

Edited by randallc

Share this post


Link to post
Share on other sites

The only thing that worries me about all those checking is that Windows caches stuff. When you first will run the script it will take couple of minutes to do something. While when you do it the second time on same amount of files it will be seconds or so. So i'm wondering how you executed those tests? On same files for all scripts? Rebooted every test?


My little company: Evotec (PL version: Evotec)

Share this post


Link to post
Share on other sites

The only thing that worries me about all those checking is that Windows caches stuff. When you first will run the script it will take couple of minutes to do something. While when you do it the second time on same amount of files it will be seconds or so. So i'm wondering how you executed those tests? On same files for all scripts? Rebooted every test?

Hi,

I agree, it is a major problem and hassle to keep rebooting..

WeaponX has raised an issue with my script, though, that I was re-checking and doubling the times when the commonly used filter is "*" or "*.*", and I had previously made a special case for that (just a helper file which was never called!)

I have re-called it, and now match his speeds on "*", but there may be bugs,... oh well. we try.

I don't think there is any way to speed up other filters, or multiple filters...

Best, randall

Share this post


Link to post
Share on other sites

@randallc / weaponx

I just released a Duplicate File Finder.

In there I used a function called "_FileListToArrayEx"

This is not the fastest around.

If you guys agree on which function is fast and stable.

I am willing to replace it using one of yours.

regards,

ptrex

Share this post


Link to post
Share on other sites

What is the purpose of the duplicate file finder? Is that error control for the FileListToArray function or just a seperate function altogether?

Share this post


Link to post
Share on other sites

What is the purpose of the duplicate file finder? Is that error control for the FileListToArray function or just a seperate function altogether?

Hi,

1. Inside "FileListToArray "

If I do multiple filter; eg

"*.txt|*x*"

I will get some files twice, so I prefer to remove dupes before returning them [ie just same file duplicated in the array list]

2. @ptrex is just doing it as all purpose Dupe finder as requested elsewhere, I believe, using MD5 checksum rather than names; different issue altogether.

Best, randall

Share this post


Link to post
Share on other sites

@weaponx

What is the purpose of the duplicate file finder?

The purpose is to identify identical files on your system, by means of the MD5 CheckSum.

Even if the name of the files are not identical it will still idendify the content as identical.

Regards,

ptrex

Share this post


Link to post
Share on other sites

Alright I just went through and verified all file counts and Ubounds, a couple of the array versions were off. I had to ditch autoit tags in the forum because backslashes were getting stripped and giving incorrect file counts.

@randallc - I think your method will win out in the next test and heres why:

The helper function you use looks redundant at first but when I add in a file filter into my #1 it also filters out folder names. This means I will have to use regex to check each filename, so I don't know how speed will be affected. The helper function provides a list of unfiltered folder names.

Share this post


Link to post
Share on other sites

Added benchmarks for #1 - #4 using a file filter. Very interesting results. #4 has very distinct advantage. I will run more tests with multiple filters and excludes later.

Share this post


Link to post
Share on other sites

Added benchmarks for #1 - #4 using a file filter. Very interesting results. #4 has very distinct advantage. I will run more tests with multiple filters and excludes later.

Are you rebooting the computer per each test? Or how do you work it out?


My little company: Evotec (PL version: Evotec)

Share this post


Link to post
Share on other sites

Did somebody notice that #1 doesn't display the very first found file in the array listing?

After some experimenting with the code I managed to get it working.

This is a working example, maybe not the best solution, but it works.

Also I compared the outcome with MSoft results from Disk Defragmenter (analyze).

There is a difference of 1387 files (defrag claims to find more files) on my PC (XP-ProSR2)

Who can explain this?

The edited code:

CODE
#cs ----------------------------------------------------------------------------

AutoIt Version: 3.2.8.1

Author: WeaponX (Modified by Scriptonize)

Script Function:

Recursive file search (array based, no redimensioning)

Notes:

-Second fastest by a slim margin (a few milliseconds)

#ce ----------------------------------------------------------------------------

#include <array.au3>

$timestamp = TimerInit()

$Array = RecursiveFileSearch("C:\")

MsgBox(0,"Search results","Time needed: " & Round((TimerDiff($timestamp) / 1000),2) & " seconds" & @CR & "Number of files counted: " & $Array[0]-1)

$Array[0]=$Array[0]-1

_ArrayDisplay($Array)

Exit

;--------------------------------------------------------------------------------

Func RecursiveFileSearch($startDir, $depth = 0)

$startDir=Check4BSlash($startDir)

If $depth = 0 Then

;Get count of all files in subfolders

$RFSfilecount = DirGetSize ($startDir,1)

Global $RFSarray[$RFSfilecount[1]+1]

EndIf

$search = FileFindFirstFile($startDir & "*.*")

If @error Then Return

;Search through all files and folders in directory

While 1

$next = FileFindNextFile($search)

If @error Then ExitLoop

;If folder, recurse

If StringInStr(FileGetAttrib($startDir & "" & $next), "D") Then

RecursiveFileSearch($startDir & "" & $next, $depth + 1)

Else

;Append filename to array

If $RFSarray[1]="" Then ;If this is the very first run then we need to add the first file to the array by using this code....

$RFSarray[1] = $startDir & "" & $next

$RFSarray[0] += 1

Else

$RFSarray[$RFSarray[0]] = $startDir & "" & $next

EndIf

;Increment filecount

$RFSarray[0] += 1

EndIf

WEnd

FileClose($search)

If $depth = 0 Then Return $RFSarray

EndFunc

Func Check4BSlash($FolderOrFile)

If StringRight($FolderOrFile,1)<>"\" Then $FolderOrFile=$FolderOrFile & "\"

Return $FolderOrFile

EndFunc


If you learn from It, it's not a mistake

Share this post


Link to post
Share on other sites

@MadBoy - I haven't been rebooting because I wasn't seeing different results. I will look into it.

@Scriptonize - I'm not seeing a problem. Did you use the most recent version that is posted because I fixed the same issue you are showing this morning.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now