Modify

Opened 11 years ago

Closed 11 years ago

#599 closed Feature Request (Completed)

_FileListToArray costs O(N^2) instead of O(NlogN)

Reported by: code65536 Owned by: Gary
Milestone: 3.2.13.10 Component: Standard UDFs
Version: Severity: None
Keywords: Cc:

Description

Memory re-allocation is an expensive operation and as such, should be avoided. For _FileListToArray to read a directory with N entries, it will need to perform N ReDim operations, and with each ReDim operation costing O(N) (assuming that ReDim is implemented efficiently), _FileListToArray will cost O(N2).

A very common way to mitigate this is to double the array size with each re-allocation instead of just incrementing it. Under such a system, you will greatly reduce the number of ReDim operations, down to log2(N), bringing the final asymptotic cost of the function down to O(NlogN), which is optimal.

I am attaching a patch that will fix this problem.

With large directories with about 300 files, implementing this simple change reduced the time spent by over half (from ~10s for 100 runs down to ~4.8s for 100 runs in my benchmark). Proportionally, the performance win will be even larger for directories with more entries.

Attachments (2)

bug599.patch (705 bytes) - added by anonymous 11 years ago.
patch
bug599-v2.patch (902 bytes) - added by code65536 11 years ago.
patch v2

Download all attachments as: .zip

Change History (9)

Changed 11 years ago by anonymous

patch

comment:1 Changed 11 years ago by anonymous

Note that with this patch, UBound will no longer correspond to the number of entries found, but since the first element of the result array contains the number of entries, I don't think that this would be a problem.

If people do think that this is a problem--that people are using UBound instead of the first array element, you can add a final ReDim to the end, right before the return to "trim" away any excess size from the array. The function will still retain O(NlogN) performance.

comment:2 Changed 11 years ago by Valik

The function must trim to the correct size. People do use UBound() because it's the correct way. The size in element 0 is a carry-over convention from before UBound() existed and is thus largely pointless and redundant.

Changed 11 years ago by code65536

patch v2

comment:3 Changed 11 years ago by Gary

  • Type changed from Bug to Feature Request

comment:4 Changed 11 years ago by TicketCleanup

  • Version 3.2.12.0 deleted

Automatic ticket cleanup.

comment:5 Changed 11 years ago by Jpm

  • Milestone set to 3.2.13.10
  • Resolution set to Fixed
  • Status changed from new to closed

comment:6 Changed 11 years ago by Jpm

  • Resolution Fixed deleted
  • Status changed from closed to reopened

comment:7 Changed 11 years ago by Jpm

  • Resolution set to Completed
  • Status changed from reopened to closed

Guidelines for posting comments:

  • You cannot re-open a ticket but you may still leave a comment if you have additional information to add.
  • In-depth discussions should take place on the forum.

For more information see the full version of the ticket guidelines here.

Add Comment

Modify Ticket

Action
as closed The owner will remain Gary.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.