Jump to content

_ArrayTo2DArray() - Parse Large Text Files To 2D Array Quickly [With Chunk Size]


DJKMan
 Share

Recommended Posts

post-47664-0-24916600-1371846835_thumb.j

 

This script is fairly straightforward. If you ever worked with large files before then this may be of help. By large I mean files of 2 MB or so. Granted this doesn't sound so big but going through the file and parsing it to a 2D array all at once took an astronomical amount of time so I wrote my own function to handle this. I discovered that chunking a large array can boost the performance of iterating through the elements and theoretically this should maintain the performance no matter how large the array size is. I know there is room for improvement so please feel free to contribute! 

Note: I wasn't able to fully test this on larger files such as 200 MB in size due to AutoIt complaining about an error allocating memory while executing _FileReadToArray(). Any help is appreciated.

Features:

  • Chunking (Performance will never degrade over time; I.E. Capable of parsing 200 lines or 20,000 and no performance hit will occur)
  • Automatically re-sizes to dynamic columns 
  • Preserves Columns while parsing
  • FAST!!!!!! (I can parse a file that contains 24,000 lines with variable columns up to 8 columns and it will finish under a second.)

Script:

_ArrayTo2DArray.au3

Example usage:

Local $aExport ;Initialize array
_FileReadToArray("LARGE TEXT.txt", $aExport) ;Returns 1D array of file
Local $aSheet = _ArrayTo2DArray($aExport) ;Converts it to 2D

Example Text File:

LARGE TEXT.txt

 

This script was inspired by >this post.

*Updated attachment: Minor bug fixes*

 

*UPDATE June 6, 2013: I apologize! I just realized I made a complete mess of the algorithm. I'm working on a fix now.*

*UPDATE June 6, 2013: Bug fixed! It's attached in the post now.

Edited by DJKMan

 

 

Link to comment
Share on other sites

  • 2 months later...
  • 3 weeks later...

I'm glad you like it! I have an idea to improve on it. It involves timing itself and automatically adjusting the chunk size for the best performance. This should allow for the algorithm to attempt to achieve the best performance possible, thereby, even faster parsing! 

Plus, it will become much easier to implement as we will no longer have to manually figure out the optimum chunk size on different platforms! :)

 

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...