Sign in to follow this  
Followers 0
DJKMan

_ArrayTo2DArray() - Parse Large Text Files To 2D Array Quickly [With Chunk Size]

3 posts in this topic

#1 ·  Posted (edited)

post-47664-0-24916600-1371846835_thumb.j

 

This script is fairly straightforward. If you ever worked with large files before then this may be of help. By large I mean files of 2 MB or so. Granted this doesn't sound so big but going through the file and parsing it to a 2D array all at once took an astronomical amount of time so I wrote my own function to handle this. I discovered that chunking a large array can boost the performance of iterating through the elements and theoretically this should maintain the performance no matter how large the array size is. I know there is room for improvement so please feel free to contribute! 

Note: I wasn't able to fully test this on larger files such as 200 MB in size due to AutoIt complaining about an error allocating memory while executing _FileReadToArray(). Any help is appreciated.

Features:

  • Chunking (Performance will never degrade over time; I.E. Capable of parsing 200 lines or 20,000 and no performance hit will occur)
  • Automatically re-sizes to dynamic columns 
  • Preserves Columns while parsing
  • FAST!!!!!! (I can parse a file that contains 24,000 lines with variable columns up to 8 columns and it will finish under a second.)

Script:

_ArrayTo2DArray.au3

Example usage:

Local $aExport ;Initialize array
_FileReadToArray("LARGE TEXT.txt", $aExport) ;Returns 1D array of file
Local $aSheet = _ArrayTo2DArray($aExport) ;Converts it to 2D

Example Text File:

LARGE TEXT.txt

 

This script was inspired by >this post.

*Updated attachment: Minor bug fixes*

 

*UPDATE June 6, 2013: I apologize! I just realized I made a complete mess of the algorithm. I'm working on a fix now.*

*UPDATE June 6, 2013: Bug fixed! It's attached in the post now.

Edited by DJKMan
1 person likes this

My work in AutoIt (Not many yet):

Parse Large Text Files To 2D Array Quickly [With Chunk Size]

 

My artificial intelligence project coded entirely in AutoIt. Meet Alice Assistant: http://facebook.com/ProjectAliceAI

 

Share this post


Link to post
Share on other sites



I'm glad you like it! I have an idea to improve on it. It involves timing itself and automatically adjusting the chunk size for the best performance. This should allow for the algorithm to attempt to achieve the best performance possible, thereby, even faster parsing! 

Plus, it will become much easier to implement as we will no longer have to manually figure out the optimum chunk size on different platforms! :)


My work in AutoIt (Not many yet):

Parse Large Text Files To 2D Array Quickly [With Chunk Size]

 

My artificial intelligence project coded entirely in AutoIt. Meet Alice Assistant: http://facebook.com/ProjectAliceAI

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Hanukka
      By Hanukka
      Hello peeps, can any one please give an example of passing an array[1D] to another script, then read and display it. Thanks
    • RyukShini
      By RyukShini
      #Include <file.au3> #Include <Array.au3> Local $nobrainArray $var = _FileReadToArray("example.txt", $nobrainArray) $split = StringSplit($var, ":"); split by colon? _ArrayDisplay($split) Its getting later and I am getting more and more tired so I think I should go to bed and give this another look tmr.
      but if someone could help me i'd be grateful!
       
      randomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\nrandomfirstname:randomlastname\n----------------------------------------------------------------------\n\nThe topic can be found here:\nhttps://www.websitehere.com\n\n\nYou can unsubscribe at any time here: https://www.websitehere.com/unsubscribe/Zm9ydW1zO2ZvcnVtczs0MzszOTc0MTA7Mzk3NDEwO25pa29sYXppbmRvQGdtYWlsLmNvbQ,,/\n\nIf you are not following any forums and wish to stop receiving notifications, uncheck the setting\n\"Send me news and information\" found in \'My Settings\' under \'Notification Options\'.\n',545627,'followed_forums','https://www.websitehere.com/topic/','forums','forums',43,'4745c9f0607baec3e8bc38f47d07f9bd'),(622776,49813,1457299052,1,'<a href=\'https://www.websitehere.com/!545627\'>Antepliemmo</a> posted topic <a href=\'https://www.websitehere.com\'>\n\n----------------------------------------------------------------------\n As you can see this is very messy!
      There is random first names and last names everywhere and then there is a lot of junk....
      I am extracting all the names/last names for a buddy, but I just can't seem to figure it out.

      Any help is appreciated, I'll keep working on this tomorrow again wish a fresh mindset!
       
      Regards

      Ryuk
    • TheDcoder
      By TheDcoder
      Hello, I wonder if there is a better way than this!:
      #include <Array.au3> Local $aArray[1][3] $aArray[0][0] = 1 $aArray[0][1] = 2 $aArray[0][2] = 3 ;$aArray[0] = [1, 2, 3] _ArrayDisplay($aArray) IIRC line no. 9 should work, but its not
       
      Thanks in Advance, TD
    • TThomasson
      By TThomasson
      Hi everyone. New guy here. I'm still learning this awesome language and I'm unable to figure this one out from google searches. Heres my problem:
      I'm working on a small application to help users in my environment connect to wireless projectors. To keep this easily updated with new projectors I'm reading the room names and IP addresses from a csv file and putting them into a 2D array. (MeetingRoom1,xxx.xxx.xxx.xxx)
      So far I'm able to read the 0 column and display the room names in a combo box. Where I am stuck is how to take the user's room selection from the gui and associate it with an IP address in the array. After that point I've got things prepared to pass the address to the connection application.  
      Any help you all could provide would be greatly appreciated. 
    • Palestinian
      By Palestinian
      I've been working on a program for a while now to help me with my job in data entry, I'm facing one last issue and I think using an Array might just do the job.
      Most of the data I work with contains 1 medical code "example: E11", I made 2 input boxes in my program one for the letter and the 2nd for the number, sometimes i get cases with +2 medical codes, I want a way to put as many medical codes as needed before I run the program.
      A little more explanation:
      I get a case with 1 medical code, i enter it in my program then run it, the program then enters the needed information plus that code, if i get 2 medical codes, I have to wait for the program to finish it work then enter the 2nd code manually, I added a little button that when clicked will do the part of the program where it enters the medical codes so that way i only have to enter the 2nd code and click the button (without using my program i have to wait for 4 loading screens).
      Here is what I came up with so far:

      Local $finalMC1 = GUICtrlRead($Medical) Local $finalMC2 = GUICtrlRead($Medical2)   Global $MC[1][2] = [[$finalMC1, $finalMC2]]   If _GUICtrlEdit_LineLength($Medical2) = 2 And _GUICtrlEdit_GetModify($Medical2) > 0 Then     ReDim $MC[UBound($MC) + 1][2]     $MC[UBound($MC) - 1][0] = $finalMC1     $MC[UBound($MC) - 1][1] = $finalMC2     GUICtrlSetState($Medical, $GUI_FOCUS)     _GUICtrlEdit_SetModify($Medical2, False) EndIf  
      So basically if $Medical2 = 2 characters the script will resize the array and add the values of $Medical and $Medical1 to it, so far so good.
      Next thing I'll be needing is to find out how many items are in the $MC array (found out that Ubound will do the job on this one), I tried using "For...To...Step...Next", then the script threw the following error: "ReDim" used without an array variable.", To be completely honest here I have no idea whats causing that error, this is my 1st time working with Arrays, UBound and ReDim, I got the way for resizing 2D Arrays from a post here on the forums, I tried googling the error and came back with nothing, It's really getting irritating...