orbs

_FileListToArrayRec() memory usage question

10 posts in this topic

this is a simple example of memory usage by _FileListToArrayRec() to recursively list all files in a drive:

#include <File.au3>
Global $aFile = _FileListToArrayRec('C:\', Default, Default, $FLTAR_RECUR)
Global $aProcessStats = ProcessGetStats()
ConsoleWrite('WorkingSetSize = ' & $aProcessStats[0] & @CRLF)
ConsoleWrite('PeakWorkingSetSize = ' & $aProcessStats[1] & @CRLF)

the output:

WorkingSetSize = 81596416
PeakWorkingSetSize = 104312832

(b.t.w. when set to return full path, the counts increase by 2%-3%)

this is for a regular home PC, drive C:\ with ~60GB used, ~131,000 files (in 83,700 directories).

for an external drive with ~200GB used, ~176,000 files (in 43,246 directories), the output:

WorkingSetSize = 99545088
PeakWorkingSetSize = 116850688

obviously, the memory usage increases as file count increases. (also, it seems the memory usage / file count ratio also increases as file count increases).

now, suppose i need to list files in much larger volumes.

at what point am i supposed to begin worrying, and what should i be worrying of?

i mean, will Windows handle excessive memory requirement for me with the page file (or other "virtual memory" methods i'm not aware of)? does AutoIt itself has a relevant limitation? if so, what happens at that limitation? i know of the "16,777,216 Maximum number of elements for an array" limitation, but that's too far away to worry about... i think.

 

 

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

ok, so i did not test it in production (yet), but i came up with a way to test without having lots of files clogging my system. i can create empty files! as many as i want! :P

so, for now, a bit over 2 million files (albeit very short names), the results are:

WorkingSetSize = 438026240
PeakWorkingSetSize = 608473088

so it seems i have a longer way to go before i break anything...

 

EDIT: for a bit over 3 million files, now with longer full path, it's:

WorkingSetSize = 1147662336
PeakWorkingSetSize = 1656619008

"breaking something" attempts still in progress...

Edited by orbs

Share this post


Link to post
Share on other sites

oh yeah!

finally, at a bit over 7.5 million files, Task Manager hangs at just under 4GB (4,108,232 K), and...

AutoIt - Error allocating memory.png

when i acknowledge the MsgBox, the script terminates with exit code 1.

this was run by AutoIt 32-bit, on a 64-bit Windows 7 with 16GB RAM.

next (obvious) step: run by AutoIt 64-bit. no crash, results:

WorkingSetSize = 4596903936
PeakWorkingSetSize = 6896627712

so now i'm trying to break it for 64-bit.

Share this post


Link to post
Share on other sites

Is there a point to all of this?


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

yes, there is.

there are two scripts i'm using. one performing a specialized analysis of the file store, including backup of specific files and folders. and the other (which i released as open-source, b.t.w) - i'm using to look for paths which exceed the 260 characters limitation in length. true that this limitation is something i have already overcome; not so true for some legacy application still in use.

until now, these scripts were using recursion to traverse the filesystem. which works just fine, as long as there are only this many files to check. but file count increases over time, and i came to realize that _FileListToArraRec() beats recursion. but, it has its limits; and those limits i'm now checking.

so, it comes to this: run by AutoIt 64-bit, for ~12.5 million files, the results are:

WorkingSetSize = 8168189952
PeakWorkingSetSize = 12210753536

so i'm getting closer to my hardware limits, as well as to AutoIt array size limits.

i'm starting to think the best way to go would be to acknowledge that the storing of the results in an array is something that should be avoided, if possible. i will look inside _FileListToArraRec() and see if - instead of storing results in an array (and operating on those results later) - i can perform the operation immediately. something like a callback function, or even a complete custom rewrite.

still, an opinion from a more professional programmer than myself is something i'd appreciate.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

When are you ever going to encounter a drive with that many files on it?

Also, if you know there's a memory limit, search just the root folder and return just folders. Then you can recurse through each folder individually which should limit the number of files/folders returned.

BTW, don't forget that there's a limit to the number of rows an array can have in AutoIt, you're limited to 16,777,216.

Edited by BrewManNH
more information

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

orbs,

_FileListToArrayRec only stores all of the matching files in arrays so it can display (and perhaps sort) them later. If all you need is to access each file and do some form of immediate manipulation, then the only array required is the one holding the found folders to search, which is a significantly smaller number than the total number of files - unless you have a very strange file structure!

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

Share this post


Link to post
Share on other sites

thanks BrewManNH and Melba23 for your replies.

i seemed to have struck AutoIt array limitation first. at ~16 million files, script crash with exit code 3221225477 (a.k.a. 0xc0000005 - STATUS_ACCESS_VIOLATION).

so, i will follow the advice by Melba23, which coincides with my last insight (that storing file names in array should be avoided). after reading inside the function (and taking the time to understand it), i will edit the function for custom operation to be performed immediately when a matching file is encountered.

Share this post


Link to post
Share on other sites

orbs,

Let me know if I can help in any way - the function is not difficult to understand but is a little complex at first glance!

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

12 hours ago, Melba23 said:

... a little complex at first glance!

indeed. or, at 390 lines, at least very long.

it offers impressive functionality, but for which, for the specific task at hand, i have no need. some of it (the sort feature) also collides with my intent to avoid storing files in an array. so i opted to rewrite its core functionality, which turns out to be 20 lines of code (+10 lines of comments).

then i figured that if i'm going to use it for other purposes (which i am), i better form it as a fully-qualified UDF. as a general-purpose UDF, it accepts a callback function to operate on every file it encounters. this is it, and any comments are welcome (especially since i'm not certain that calling a function so many times is the most efficient way :think:).

the UDF "Traverse.au3":

#include-Once

; #INDEX# =======================================================================================================================
; Title .........: Traverse
; AutoIt Version : 3.3.14.2
; UDF Version ...: 0.1
; Status ........: Production
; Language ......: English
; Description ...: Traverses a directory tree and performs a user-defined function on every file.
; Author(s) .....: orbs
; ===============================================================================================================================

; #CURRENT# =====================================================================================================================
;_Traverse
; ===============================================================================================================================

; #FUNCTION# ====================================================================================================================
; Name ..........: _Traverse
; Description ...: Traverses a directory tree and performs a user-defined function on every file.
; Syntax ........: _Traverse($sRootPath, $sCallback)
; Parameters ....: $sRootPath - The root path to begin traversing from.
;                  $sCallback - Function name to be called when a file is encountered.
; Return values .: None
; Author ........: orbs
; Modified ......:
; Remarks .......: This function calls the callback function with one parameter, which is the full path of the file. That full
;                  path does not include the unicode prefix, unless the unicode prefix was specified for the root path.
; Related .......:
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _Traverse($sRootPath, $sCallback)
    ; declare vars
    Local $sCurrentPath, $hSearch, $sName
    __Traverse_SetUnicodePrefix($sRootPath)
    ; init folders array
    Local $asFolderSearchList[100] = [1]
    ; start with root, accmulate subfolders when encountered
    $asFolderSearchList[1] = $sRootPath
    ; loop folders until no more folders exist in array
    While $asFolderSearchList[0] > 0
        ; Set path to search
        $sCurrentPath = $asFolderSearchList[$asFolderSearchList[0]]
        ; Reduce folder search list count
        $asFolderSearchList[0] -= 1
        ; If folder empty move to next in list
        $hSearch = FileFindFirstFile(__Traverse_SetUnicodePrefix($sCurrentPath) & '\*')
        If $hSearch = -1 Then ContinueLoop
        ; Search folder - use code matched to required listing
        While True
            $sName = FileFindNextFile($hSearch, 1)
            ; Check for end of folder
            If @error Then ExitLoop
            ; If folder then add to search list, if file then callback
            If StringInStr(@extended, 'D') Then
                __Traverse_ReDim($asFolderSearchList, $sCurrentPath & '\' & $sName)
            Else
                Call($sCallback, $sCurrentPath & '\' & $sName)
            EndIf
        WEnd
    WEnd
EndFunc   ;==>_Traverse

; #INTERNAL_USE_ONLY# ===========================================================================================================
;__Traverse_ReDim
;__Traverse_SetUnicodePrefix
; ===============================================================================================================================

; #FUNCTION# ====================================================================================================================
; Name ..........: __Traverse_ReDim
; Description ...: Applies ReDim on a 1-D array in an efficient way.
; Syntax.........: __Traverse_ReDim(ByRef $aArray[, $xData=''])
; Parameters ....: $aArray - The array to ReDim
;                  $xData  - [optional] Data to add to the array
; Return values .: Returns the array after ReDim
; Author ........: guinness
; Modified.......: orbs
; Remarks .......: The returned array probably has empty cells at the end. the value at element [0] is the count of used cells,
;                  not the count of total cells. To work with the returned array, use element [0] to determine the last used cell
;                  instead of UBound.
;                  To truncate the array to used size: ReDim $aArray[$aArray[0]+1]
; Related .......:
; Dependencies ..:
; Link ..........: http://www.autoitscript.com/forum/topic/129689-redim-the-most-efficient-way-so-far-when-you-have-to-resize-an-existing-array/?hl=array+2d
; Example .......: No
; ===============================================================================================================================
Func __Traverse_ReDim(ByRef $aArray, $xData = '')
    Local $iNewRealSize = UBound($aArray)
    If $aArray[0] >= UBound($aArray) - 1 Then ; if used size = real size then must redim, otherwise do nothing
        $iNewRealSize = Ceiling((UBound($aArray) + 1) * 1.5)
        ReDim $aArray[$iNewRealSize]
    EndIf
    $aArray[0] += 1 ; anyway used size increased
    If $xData <> '' Then $aArray[$aArray[0]] = $xData
EndFunc   ;==>__Traverse_ReDim

; #FUNCTION# ====================================================================================================================
; Name ..........: __Traverse_SetUnicodePrefix
; Description ...: Adds the unicode prefix to a given path, unless already present or some conditios are mey - see Remarks.
; Syntax ........: __Traverse_SetUnicodePrefix($sRootPath)
; Parameters ....: $sRootPath - The given path to add the unicode prefix to.
; Return values .: Success - Returns the given path with the unicode prefix.
;                  Failure - Returns the given path and sets @error to 1.
; Author ........: orbs
; Modified ......:
; Remarks .......: The unicode prefix is added only if not already exists, and one of the following conditions apply:
;                    given path is a full path starting with a drive letter and a colon
;                    given path is a UNC network path starting with two backslashes
;                  Any other case (e.g. relative path) fails the function, returns the given path and sets @error to 1.
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func __Traverse_SetUnicodePrefix($sRootPath)
    Select
        Case StringLeft($sRootPath, 4) = '\\?\'
            Return $sRootPath
        Case StringMid($sRootPath, 2, 1) = ':'
            Return '\\?\' & $sRootPath
        Case StringLeft($sRootPath, 2) = '\\'
            Return '\\?\UNC\' & StringTrimLeft($sRootPath, 2)
        Case Else
            Return SetError(1, 0, $sRootPath)
    EndSelect
EndFunc   ;==>__Traverse_SetUnicodePrefix

example:

#AutoIt3Wrapper_Au3Check_Parameters=-q -d -w 1 -w 2 -w 3 -w 4 -w 5 -w 6 -w 7
#include 'Traverse.au3'

_Traverse(@TempDir, '_DoSomeActionOnEveryFile')

Func _DoSomeActionOnEveryFile($sFile)
    ConsoleWrite($sFile & @CRLF)
EndFunc   ;==>_DoSomeActionOnEveryFile

 

Edited by orbs

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now