Modify

Opened 17 years ago

Closed 17 years ago

#599 closed Feature Request (Completed)

_FileListToArray costs O(N^2) instead of O(NlogN)

Reported by: code65536 Owned by: Gary
Milestone: 3.2.13.10 Component: Standard UDFs
Version: Severity: None
Keywords: Cc:

Description

Memory re-allocation is an expensive operation and as such, should be avoided. For _FileListToArray to read a directory with N entries, it will need to perform N ReDim operations, and with each ReDim operation costing O(N) (assuming that ReDim is implemented efficiently), _FileListToArray will cost O(N2).

A very common way to mitigate this is to double the array size with each re-allocation instead of just incrementing it. Under such a system, you will greatly reduce the number of ReDim operations, down to log2(N), bringing the final asymptotic cost of the function down to O(NlogN), which is optimal.

I am attaching a patch that will fix this problem.

With large directories with about 300 files, implementing this simple change reduced the time spent by over half (from ~10s for 100 runs down to ~4.8s for 100 runs in my benchmark). Proportionally, the performance win will be even larger for directories with more entries.

Attachments (2)

bug599.patch (705 bytes ) - added by anonymous 17 years ago.
patch
bug599-v2.patch (902 bytes ) - added by code65536 17 years ago.
patch v2

Download all attachments as: .zip

Change History (9)

by anonymous, 17 years ago

Attachment: bug599.patch added

patch

comment:1 by anonymous, 17 years ago

Note that with this patch, UBound will no longer correspond to the number of entries found, but since the first element of the result array contains the number of entries, I don't think that this would be a problem.

If people do think that this is a problem--that people are using UBound instead of the first array element, you can add a final ReDim to the end, right before the return to "trim" away any excess size from the array. The function will still retain O(NlogN) performance.

comment:2 by Valik, 17 years ago

The function must trim to the correct size. People do use UBound() because it's the correct way. The size in element 0 is a carry-over convention from before UBound() existed and is thus largely pointless and redundant.

by code65536, 17 years ago

Attachment: bug599-v2.patch added

patch v2

comment:3 by Gary, 17 years ago

Type: BugFeature Request

comment:4 by TicketCleanup, 17 years ago

Version: 3.2.12.0

Automatic ticket cleanup.

comment:5 by J-Paul Mesnage, 17 years ago

Milestone: 3.2.13.10
Resolution: Fixed
Status: newclosed

comment:6 by J-Paul Mesnage, 17 years ago

Resolution: Fixed
Status: closedreopened

comment:7 by J-Paul Mesnage, 17 years ago

Resolution: Completed
Status: reopenedclosed

Modify Ticket

Action
as closed The owner will remain Gary.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.