Jump to content

Out of memory error 270MB csv sorting arrays and modifying data


Recommended Posts

Watch out for lines that contain commas in fields. Like filenames containing commas. (They're possible, so Murphy dictates that they will happen.) Make sure you know how that ends up in the log, as it may mean that the number of commas can vary which will make a number of fields going into the wrong columns if you just do StringSplit or regex-parse without any special processing for those scenarios. You may want to do a "quick" regex search in your log file for ^(.*?,){15}[^,]*$ to look for any lines with 15 or more commas to see if that ever happens (or pump it into excel and list all unique values in column 16 or something :)).

Also on a separate note, your code really isn't really all that readable. Variables like $array01 and $array03, $i01, global arrays used in your function, exact duplicate lines of code of which one is commented out, no idea how your data-holding array is currently built... Hard to debug like that. I suggest you clean up that code a bit, or preferably, write a complete reproducer so others may see anything else going wrong in how you work with the array.

 

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

OK, I'll do my best to help you but I need to move right now. I'll be back in few hours.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@SadBunny  - FYI: I have a line of code that strips all the extra commas that could be found within the csv

For $i = UBound($array01) - 1 To 0 Step -1
$array01[$i] = Execute('"' & StringRegExpReplace($array01[$i], '(".+?")', '" & StringReplace(""\1"",",","") & "') & '"')
Next

I understand, most all my code is in beta - so it does look horrible. However, it does work now at least lol... :P Thanks everyone for your feedback.

I've been scouring the forums not for a nth comma replace... etc.

UEZ  - I can't see what you wrote there... don't know if it is a forum issue or not.
FYI - FIGURED OUT THIS WAS A PICTURE LINKED EXTERNALLY or OUTSIDE OF AUTOITSCRIPT.COM - BLOCKED BY MY FIREWALL

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Beta should not mean horrible looking code (your words, luckily :) ) - it should mean unresolved bugs, incomplete functionality and such. IMHO.

Replacing an nth comma is very doable, see this thread from a month ago. But why would you want to do that in this case?

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

This is what I plan on using to replace ANYTHING AFTER the nth comma prior to making the two dimensional array.

For $i = UBound($array01) - 1 To 0 Step -1
$array01[$i] = StringRegExpReplace($array01[$i], )
Next

I'm currently trying to find an example to follow like one's I've used before...
 

$array01[$i] = Execute('"' & StringRegExpReplace($array01[$i], '(".+?")', '" & StringReplace(""\1"",",","") & "') & '"')

 

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

This is my first idea:

#include <Array.au3>
#include <File.au3>
#include <WinAPIProc.au3>
#include <WinAPIMisc.au3>


#cs create test data
$s = ",4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109" & @CRLF
Global $sTestCSV
For $i = 1 To 725600
    $sTestCSV &= $s
Next

$hFile = FileOpen(@ScriptDir & "\Test.csv", 2)
FileWrite($hFile, $sTestCSV)
FileClose($hFile)
#ce


Global $sFile = @ScriptDir & "\Test.csv"

ConsoleWrite("Peak working set size: " & _WinAPI_StrFormatByteSize(_WinAPI_GetProcessMemoryInfo()[1]) & @CRLF)

Global $fEnd, $fTimer = TimerInit()
Global $aResult = TextSplit2D($sFile)
$fEnd = TimerDiff($fTimer)
ConsoleWrite("Test file loaded and converted to a 2D array in " & Round($fEnd / 1000, 2) & " seconds." & @CRLF)

ConsoleWrite("Peak working set size: " & _WinAPI_StrFormatByteSize(_WinAPI_GetProcessMemoryInfo()[1]) & @CRLF)

;~ _ArrayDisplay($aResult)


Func TextSplit2D($sFile, $sDelim = ",")
    Local $aFile = FileReadToArray($sFile)
    Local $iWidth = 8, $iHeight = UBound($aFile)
    Local $aResult[$iHeight][$iWidth], $aTemp, $i, $j
    For $i = 0 To UBound($aFile) - 1
        $aTemp = StringSplit($aFile[$i], ",", 2)
        For $j = 1 To 6
            $aResult[$i][$j - 1] = $aTemp[$j]
        Next
        $aResult[$i][6] = $aTemp[10]
        $aResult[$i][7] = $aTemp[14]
    Next
    Return $aResult
EndFunc

Result:

Peak working set size: 10,4 MB
Test file loaded and converted to a 2D array in 23.28 seconds.
Peak working set size: 1,14 GB

 

The result is an array.

Edited by UEZ
Removed the removal of @CRLF because it is not needed in this case

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

So the idea is to strip all commas in quoted strings? Well... I guess that's one way of writing it... :) Took me some time to understand though. Does that contraption actually work? :D  But I still don't see what that has to do with replacing an nth character or why you would want to do that.

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

Thanks UEZ :idea:

I'm going to try and break down (comprehend) your solution.

:huh2:

... And HEY!!! Your autoit code tag actually has colors! LOL, gotta see what I'm doing wrong in the forum code.

Right, SadBunny... I'm going to try and limit the size of the array (delete anything after nth comma) hopefully before I convert to a 2D array.

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Good Afternoon Everyone,

Can anyone tell me if what I have here could cause any long term issues? I am starting to go through my string removing what would be unnecessary columns...

$string = "4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109"

$replace = StringRegExpReplace($string, '^(.*),.*$' , '\1')

ConsoleWrite($replace & @LF)

Thanks again!

If anyone has an idea how to delete anything between "the nth" comma from the $string - that would be helpful as well... I'll keep going - find it soon, but if you feel like helping LOL, I could use it! (kinda need it)

:idiot:

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Getting there..

$string = "4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109"

ConsoleWrite($string & @LF)

; 15th Column Delete

$replace = StringRegExpReplace($string, '^(.*),.*$' , '\1')

ConsoleWrite($replace & @LF)

; Attempting 13th Column Delete

$replace = StringRegExpReplace($replace, '^(.*),(.*),.*$' , '\1')

ConsoleWrite($replace & @LF)

... but not quite

Need to wipe out ",SOL"

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Does that work like you want (on a limited sample)?
 

#cs

I would delete columns 15,13,12,11,9,8, and 7 from the array (marked * below)

    1   4/26/15 6:40:07 AM
    2   ZZ123123
    3   10.0.0.1
    4   U-def R:rulename
    5   deny create
    6   C:\Users\username\AppData\Roaming\value.exe
*   7
*   8   _
*   9   10.0.0.1
    10  C:\Users\username\AppData\Local\Temp\A6F5.tmp
*   11  'File' access
*   12  Notice
*   13  SOL
    14  My Site\Workstation\SITENAME Workstation\
*   15  109"

#ce

$s = "4/26/15 6:40:07 AM,ZZ123123,10.0.0.1,U-def R:rulename,deny create,C:\Users\username\AppData\Roaming\value.exe,,_,10.0.0.1,C:\Users\username\AppData\Local\Temp\A6F5.tmp,'File' access,Notice,SOL,My Site\Workstation\SITENAME Workstation\,109"
$s = StringRegExpReplace($s, '(?U)^(.*,)(.*,)(.*,)(.*,)(.*,)(.*,)(?:.*,){3}(.*,)(?:.*,){3}(.*),.*$', '$1$2$3$4$5$6$7$8')
$a = StringSplit($s, ',', 2)
_ArrayDisplay($a)

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I believe my coding / regular expression prayers have been answered! o:)

Thank you very much everyone!

I'm going to test it out but from what I see here - it all makes perfect sense now.

Thank you UEZ and jchd - I would be racking my brain for days if it weren't for your insight and help!!!

:thumbsup::huggles::sorcerer::ninja:

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Well, after all this, very specials thanks goes out to UEZ and jchd. :graduated:

I now have two totally different ideas and options to split the code up in my app - using arrays and / or strings :D

Sadly, the amount of memory required won't decrease any time soon... so it's upgrade city for me.  :robot:

Very impressive code everyone, again thank you all for your help and support! :ILA2:

If by chance anyone get's any deep details into array management and memory usage - please let me know - I'm always around!

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Let me suggest one more time to do some preprocessing with like powershell or cygwin tools. It will be fast and it will save a lot of memory. What you don't load into your array and/or memory, you don't have to take out of it again.

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

I agree with your logic, but making this application any more complex than it already is just adds to more possibilities of it not working during any small changes. I'd like to keep things simple and inclusive if at all possible and I believe most folks prefer that as well.  I'll check out "cygwin tools" as powershell is a cluster imo. Thank you for the suggestion SadBunny.

What I meant by "If by chance anyone get's any deep details into array management and memory usage" was that if by chance an AutoIT developer here has details on built in AutoIT functions and why I'm experiencing "doubling" of memory usage during functions like _FileReadToArray _ArrayColDelete or _ArrayInsert  - I would be very interested in knowing more about the inner workings, plan b, plan c, etc.

:construction:

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

FYI - Checking out forums - looking for au3's - came across one from DJKMan _ArrayTo2DArray

$twoDarray =  _ArrayTo2DArray($array01, ",") ;Converts it to 2D

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Okay, I believe this is my final question...

I find unique computer names in $array01 and store them in $avResult

Now my question is, is there a way to populate a new 2D array based on results from a _ArrayFindAll without having to populate the new array 1 column at a time?

Is there a "row" element I could populate the entire array row from instead of looping through row 0, element 0, element 1, element 2... row 1, element 0, element 1, element 2 etc.

$iSubItem? $fRow?

Local $avResult = _ArrayFindAll($array01, $aUniqueHostname[$i01], 0, 0, 0, 0, 1)
_ArrayDisplay($avResult, "$avResult")

Local $aNewArray[UBound($avResult)][UBound($array01, 2)]

SplashTextOn($ProgramTitle, 'CF3 Loop - Array build new array on unique hostname', 400, 60, (@DesktopWidth / 2) - 200, 10, "", "")

For $i = 0 To UBound($avResult) -1 ; Loop through the returned index numbers.
    For $j = 0 To UBound($array01, 2) -1 ; Loop through each of the columns.
        $aNewArray[$i][$j] = $array01 [$avResult[$i]] [$j] ; Populate the new array.
    Next
Next

Thanks everyone!

Edited by souldjer777

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Sincirely and without willing to make seemingly simple things look more complex than expected, I now strongly believe that you'd better use a database engine to achieve your goal.

What I'd do in your place is to load the raw .CSV in an SQLite table and start processing from there. You then benefit of unlimited work space and also from the superior power of SQL to extract data actually needed in the form you really need.

If SQL and such looks like chineese to you I'm willing to offer help (openly or in private since the data seems pretty sensible) to build a robust machinery to that effect. I'm used to MDA digitally or physically signed and I stand behind my words.

OTOH if that is a one-time job, maybe there is no need to use SQL caterpillar.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

You guys are right... I can only take a 270 MB excel sheet so far before I break the pc LOL.  YOU HAVE HELPED ME GREATLY!!!  Thank you all again!!!

Now, how do I mark this thing as solved...#$#%

"Maybe I'm on a road that ain't been paved yet. And maybe I see a sign that ain't been made yet"
Song Title: I guess you could say
Artist: Middle Class Rut

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...