Jump to content

Out of memory error 270MB csv sorting arrays and modifying data


Recommended Posts

Good Morning All, :sorcerer:

I have an application that takes csv files - sorts, finds, and tries to make it's own individual csv files... Everything runs smooth until I get to the "_myCSV2DCreator" on extremely large CSV files... like 200+ MB with 100,000 + lines.  I get an "out of memory" error from AutoIT when I convert the 2D array to a CSV file using this function "_myCSV2DCreator". I get an "out of memory" error when running the _ArrayInsert($aNewArray, "0", $columnheaders, "", ","). I believe it takes whatever array is in memory and doubles it's size as it writes just one row of data. Any ideas on how to get around this? I'm stumped...

Func GEN_SeparateOutCSV03()

SendAndLog("GEN_SeparateOutCSV03", $tempzipdir & '\' &  $LogFileName01, True)

SplashTextOn($ProgramTitle, 'Separating CSV files from data...', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
Sleep ($sleeptime)

$array01 = $twoDarray

$aUniqueHostname = _ArrayUnique ($array01, 1)

;_ArrayDisplay ($aUniqueHostname, "UniqueHostname ")

For $i01 = Ubound($aUniqueHostname) - 1 to 0 Step - 1
    For $j01 = Ubound($array01) - 1 to 0 Step - 1
        If $array01[$j01][1] == $aUniqueHostname[$i01] and StringRegExp($array01[$j01][5], "MY_VALUE.txt") then
            ;MsgBox(0, "Computer and MY_VALUE.txt",  $aUniqueHostname[$i01] & " : " & $array01[$j01][5])
            $FileName01 = $tempzipdir & "\" & $array01[$j01][3] & "_" & $aUniqueHostname[$i01] & "_" & $array01[$j01][2] & ".csv"
            ; MsgBox(0, "File Name", $FileName01)

                Local $avResult = _ArrayFindAll($array01, $aUniqueHostname[$i01], 0, 0, 0, 0, 1)
                ;_ArrayDisplay($avResult, "$avResult")
                
                Local $aNewArray = ""
                
                Local $aNewArray[UBound($avResult)][UBound($array01, 2)]

                SplashTextOn($ProgramTitle, 'Loop - Array search for unique hostname', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
                Sleep ($sleeptime)

                    For $i = 0 To UBound($avResult) -1 ; Loop through the returned index numbers.
                        For $j = 0 To UBound($array01, 2) -1 ; Loop through each of the columns.
                            $aNewArray[$i][$j] = $array01 [$avResult[$i]] [$j] ; Populate the new array.
                        Next
                    Next



                SplashTextOn($ProgramTitle, 'Loop - Column header modification', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
                Sleep ($sleeptime)
                ;_ArrayDisplay($twoDarray, "Removed 'User-defined Rules:' in column 3")
                ;_ArrayInsert($aNewArray, "0", $columnheaders, "", ",")
                _ArrayInsert($aNewArray, "0", $columnheaders, "", ",")
                ;_ArrayDisplay($aNewArray, "Inserted 0")


                ;MsgBox (0, "Out of the loop", "Out of the loop - File Write From Array")
                _myCSV2DCreator($tempzipdir & "\beta_" & $array01[$j01][3] & "_" & $aUniqueHostname[$i01] & "_" & $array01[$j01][2] & ".csv", $aNewArray, True)
                ;_FileWriteFromArray($FileName01, $aNewArray)

            ExitLoop
        EndIf
    Next
Next
SplashTextOn($ProgramTitle, 'Please wait a moment...', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
Sleep ($sleeptime)
SplashOff()

SendAndLog("GEN_SeparateOutCSV03 - Okay", $tempzipdir & '\' &  $LogFileName01, True)

MsgBox(262192, $ProgramTitle, "All files stored here:" & @CR & @CR & $tempzipdir)

SendAndLog("Final Message Box - Exit Okay", $tempzipdir & '\' &  $LogFileName01, True)
Exit

EndFunc



Func _myCSV2DCreator($hFile, $avArray, $bEraseCreate = True)
    SplashTextOn($ProgramTitle, '2D to CSV file create', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
    Sleep ($sleeptime)

    If $bEraseCreate Then FileClose(FileOpen($hFile, 2))
    Local $sHoldString = ""
    For $x = 0 To UBound($avArray) - 1
        For $i = 0 To UBound($avArray, 2) - 1
            $sHoldString &= $avArray[$x][$i] & ","
        Next
        $sHoldString &= @CRLF
    Next

    Return FileWrite($hFile, StringTrimRight($sHoldString, 3))

EndFunc

:oops: (out of memory)

Is there a better method / solution that I should be using?  :idea:

Thanks Everyone!

:ILA2:

 

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Also, I'm using about 1.7 GB of memory before I even get to the loops...

How do I find out what area of the script is getting out of control? - Is there a memory logging function?

Please let me know and thank you again.

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Well, when I look to your code I can see several loops within loops and all loops have quadratic runtimes! 

This causes the enormous memory consumption.

Maybe you should rethink about what you want to achieve using more effective functions regarding memory usage. One possibility is to read the file partially and do the operation you want to do.

 

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Initial function where the file gets read to an array:

If Not _FileReadToArray($aFilesFolders[$i01], $array01) Then

Memory grows to... 524 MB

Stripping out commas, quotes and bars... = 524 MB

Converting to 2D array = 1.8 GB

Deleting 8 Columns... grows to 2.4 GB

fluctuates back down to 1.6, 1.5, 1.7...

Done with deleting columns = 1.445 GB

Deleting 1 column and deleting 1 row ... grows to 1.66 GB

Sorting columns stable at 1.67 GB

however 100% of physical memory is used.

4 GB internal memory on system.

 

 

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Can you post a runnable script and a CSV file that causes the problem?

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Sorry, I'll try to upload some of the code - but I'll have to tweak it to keep it private etc.

I'm looking at wiping my variables along the way - what is the best way to free up memory?

$array01 = ""

What will free up memory best? Is that how I should do it? $array01 = "" ?

_FileReadToArray($aFilesFolders[$i01], $array01) = 1.7 GB

Stable at 524 MB after

Converting csv data into a 2D array...

$columnsCounter = stringsplit($array01[1],",")
Dim $twoDarray[$array01[0] + 1][$columnsCounter[0] + 1]
For $x = 1 to ($array01[0])
$oneRow = stringsplit($array01[$x],",")
For $y = 1 to ($columnsCounter[0])
$twoDarray[$x][$y] = $oneRow[$y]
Next
Next


$twoDarray = 1.8 GB

Wiped $array01 = "" and went back down to 1.393 GB :D

Deleting columns rockets to 2.4 GB then is stable at 1.426 GB.

Example: _ArrayColDelete ($twoDarray, 15) ; 15th column

Thanks!!!

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

First 2D to CSV...

2D to CSV file create is stable at 1.4 GB

Things rocket off the charts again with..._ArrayInsert = 2.446 GB

_ArrayInsert($aNewArray, "0", $columnheaders, "", ",")

Then goes back down to 1.6 GB

Second 2D to CSV...

2D to CSV again is stable now at 1.639 GB

Separating CSV files data goes down to 785 MB

... then Array search for unique hostname climbs to 2.6 GB

SplashTextOn($ProgramTitle, 'Loop - Array search for unique hostname', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")
                Sleep ($sleeptime)

                    For $i = 0 To UBound($avResult) -1 ; Loop through the returned index numbers.
                        For $j = 0 To UBound($array01, 2) -1 ; Loop through each of the columns.
                            $aNewArray[$i][$j] = $array01 [$avResult[$i]] [$j] ; Populate the new array.
                        Next
                    Next

Column header modification ~2.6 GB then 2D to CSV = memory failure...

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

After skipping the _ArrayInsert($aNewArray, "0", $columnheaders, "", ",") I believe I've made progress.

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Well, when I look to your code I can see several loops within loops and all loops have quadratic runtimes! 

This causes the enormous memory consumption.

Maybe you should rethink about what you want to achieve using more effective functions regarding memory usage. One possibility is to read the file partially and do the operation you want to do.

 

​This.

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

How about using a powershell script or some linuxy cygwin parsing tools (awk/perl/egrep/sort/uniq...) to do the preparsing? I find that much more convenient than writing it all out in script code.

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

Before 2D Array Conversion - 514,192 K

$columnsCounter = stringsplit($array01[1],",")
Dim $twoDarray[$array01[0] + 1][$columnsCounter[0] + 1]
For $x = 1 to ($array01[0])
$oneRow = stringsplit($array01[$x],",")
For $y = 1 to ($columnsCounter[0])
$twoDarray[$x][$y] = $oneRow[$y]
Next
Next

After 2D Array Conversion - 1,621,424 K

Before Wiping Variables - 1,621,424 K

$array01 = ""
$columnsCounter = ""
$oneRow = ""
$x = ""
$y = ""

After Wiping Variables - 1,239,132 K

Before 8 column deletes - 1,239,132 K

_ArrayColDelete ($twoDarray, 15) ; 15th column
_ArrayColDelete ($twoDarray, 14) ; 14th column
_ArrayColDelete ($twoDarray, 13) ; 13th column
_ArrayColDelete ($twoDarray, 12) ; 12th column
_ArrayColDelete ($twoDarray, 11) ; 11th column
_ArrayColDelete ($twoDarray, 9) ; 9th column
_ArrayColDelete ($twoDarray, 8) ; 8th column
_ArrayColDelete ($twoDarray, 7) ; 7th column

After 8 column deletes - 1,415,788 K

This is kinda strange to me... it looks as though the amount of memory used has gone up after 8 x _ArrayColDelete

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Sorry, this was REALLY strange! I wasn't ignoring anyone... all I could see were my own posts and yesterdays posts! I had to hit REFRESH before I saw anyone's answers! That is not typical - I'm guessing it was a forum upgrade behavior... but I can't be sure.

Can anyone tell me how to do array management with a 2D array?
I believe that's my real issue...

Why would the amount of memory used go UP after deleting 8 columns from an array?

_ArrayColDelete

Yes, I've read the original suggestions... but that's not the root cause.

I'm trying to get an answer so I can possibly do better array management.

Before 8 column deletes - 1,239,132 K

_ArrayColDelete ($twoDarray, 15) ; 15th column
_ArrayColDelete ($twoDarray, 14) ; 14th column
_ArrayColDelete ($twoDarray, 13) ; 13th column
_ArrayColDelete ($twoDarray, 12) ; 12th column
_ArrayColDelete ($twoDarray, 11) ; 11th column
_ArrayColDelete ($twoDarray, 9) ; 9th column
_ArrayColDelete ($twoDarray, 8) ; 8th column
_ArrayColDelete ($twoDarray, 7) ; 7th column

After 8 column deletes - 1,415,788 K

This is prior to any For > Next loops.

Thank you JohnOne, SadBunny, BrewManNH and UEZ. :cheer:

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

I'm working on finding answers still from the forum posts... came across _Redim
Does anyone know how to use this function by guiness?

Func _ReDim2D_Ubound(ByRef $aArray, ByRef $iDimension, ByRef $iCount) ; Using Ubound($aArray, 1) to find the Row size and Ubound($aArray, 2) for the Column size.
    $iCount += 1
    If ($iCount + 1) >= $iDimension Then
        $iDimension = Ceiling((UBound($aArray, 1) + 1) * 1.5)
        ReDim $aArray[$iDimension][UBound($aArray, 2)]
        Return 1
    EndIf
EndFunc   ;==>_ReDim2D_Ubound

My issue - just to even test the function - is I don't know how to pass my array $twoDarray to that function.

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Instead of populating a very large array with multiple unused columns which you delete right after perhaps you should only populate rows with the minimum columns you actually need.

Also don't forget that strings are Unicode (~ UTF16), so that 'abc' takes 3 16-bit words.

Finally how do you get memory use values?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I would suggest that you provide some input data which is close to your real data and how your output should look like.

Might be we can find a better solution for you.

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Instead of populating a very large array with multiple unused columns which you delete right after perhaps you should only populate rows with the minimum columns you actually need. 

Believe me, I've thought about it, but I would have to rewrite everything... and I'd have to somehow write an expression to find the 15th column (comma) and delete the comma and the string between there... and the 14th comma and so on and so forth. YOU ALL have very good points... 

From what I'm hearing from everyone here is stay away from arrays and that many loops.

Finally how do you get memory use values?

I'm simply watching the "Memory" (Private Working Set) of my system and the AutoIT exe.

I would suggest that you provide some input data which is close to your real data and how your output should look like.

Might be we can find a better solution for you.

I wish I could share ALL the data... but let me give you an example.

This file would be csv and greater than 200 MB.
Columns would be A-O or 0-15

4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109

There are 725,600 lines exactly like the one above.

FYI - Opening a file like this in Excel takes about 2-3 minutes.

Used _FileReadToArray for csv to string. The extra commas, quotes and bars would be deleted from the strings StringRegExpReplace, the file is pushed to a 2d array $twoDarray, 8 columns would be deleted _ArrayColDelete, then finally I would look for unique values to save the files out with their own csv file names.

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Okay but with:

$s = "4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109"
$a = StringSplit($s, ',', 2)
_ArrayTranspose($a)
_ArrayDisplay($a)

which columns do you need and what are your criterion for discarding rows, is any?
 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thank you jchd. I appreciate the help! I see where we're going with this so I sincerely appreciate the help...

I would delete columns 15,13,12,11,9,8, and 7 from the array. I had a 0 column with empty data in the array which I wipe as well...

But if we're going after a string with commas... then column 0 doesn't exist ;) so just 15,13,12,11,9,8, and 7.

I don't really know what the regular expression would look like and I believe you do... so thank you.

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

That means your input looks like this:

2urkn51.png

 

and your output should look like this:

34yyyx2.png

 

Correct?

 

Edited by UEZ
Typo

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...