Jump to content

Text File to Memory size correlation?


Recommended Posts

Is there a correlation between a text file size and the memory usage when loading that file with FileReadToArray?

Since that command seems to fully lock my script, not even Adlib works, I created a second helper script that just displays a popup showing % load by reading the Memory usage of the main script and calculating against the target file size but I’m having problems with figuring out the correct math to calculate a good %.

on one file where the lines are not very long but > 1million rows:  (Mem Size / (File size * 2)) * 100 was pretty close.

on another file that has really long lines, but only around 400k rows I actually needed to do ((Mem Size * 2) / File Size) * 100 to get close to the right %’s

 

Is there some other way that I could use to get a better % no matter the file?

Thanks,

Mike

Link to comment
Share on other sites

The ratio will heavily depend on which encoding the text file uses and its actual content. AutoIt strings use 16-bit per character; add something for array overhead.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@BigDaddyO,

i know that is not what you asked, but...

5 hours ago, BigDaddyO said:

... that command seems to fully lock my script ...

if you need to handle large files, perhaps you should be reading and parsing them line-by-line, rather than reading the entire text and parsing it. no doubt that will decrease your script memory footprint.

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

If you really have to read a text file into an array while preserving AdLib functionality, prefer _FileReadToArray over FileReadToArray. The former is an UDF, hence an interruptible piece of AutoIt code, while the latter is a single built-in (uninterruptible) instruction.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

_FRTA mostly uses FRTA unless you're using the Delimiter parameter, or one of the flags used to return the array as an array-of-arrays or returning the count in the [0] element. So, it's only interruptable if you're using one of them.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Well spotted.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@orbs

I ended up doing it your way.  I'm parsing the file line by line as I'm validating.  Overall it added about 10 minutes to the 3 hour long process but as it gives the users a good progress bar, they seem to prefer it.  Though as the file is on a Network share, hopefully they don't loose connectivity while they are running else it will probably error out.

Link to comment
Share on other sites

3 minutes ago, BigDaddyO said:

hopefully they don't loose connectivity while they are running else it will probably error out.

you ought to introduce proper error checking then. after each FileReadLine() check the @error status and the return value. you may want to record the file size before you start reading, and increment a counter as you read it line-by-line, so you can verify the entire file was parsed.

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

5 hours ago, orbs said:

you ought to introduce proper error checking then. after each FileReadLine() check the @error status and the return value. you may want to record the file size before you start reading, and increment a counter as you read it line-by-line, so you can verify the entire file was parsed.

I did use _FileCountLines() so I know the total row count, but I missed adding the error check on the FileReadLine()  I'll add that.  Thanks!

Link to comment
Share on other sites

@BigDaddyO,

On 6/13/2016 at 9:32 PM, BigDaddyO said:

I did use _FileCountLines() ...

i was wondering how _FileCountLines() could determine the count of lines without reading the entire file. so i looked at the UDF. and guess what?

it does.

_FileCountLines() reads the entire file, and then counts the line breaks.

so you are actually still reading the entire file. and at the origin of this topic, you were actually reading the entire file twice - once at _FileCountLines(), and once in your main script! no wonder it took so long.

you should drop the _FileCountLines() and stick to file size indicator, which does not require reading the file at all.

 

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

I didn't have _FileCountLines() initially as I was just using the array size to know the count.  The _FileCountLines() may read the file, but it doesn't take long at all, about 15 seconds for the 1 million rows in my test file.

I did find out that _FileCountLines() does not seem to work for text files in Binary format as it only ever comes back with 1 for line count.  I tried updating the UDF to open the file in Binary mode but that didn't work.  So, for now I have split my script into 2 different files.  1 for the majority of the files, and the original one without much of a progress for the Binary file.

Link to comment
Share on other sites

Could the network be slowing you down?  Have you tried a simple copy to the local drive, then read from the HD?  If it is still slow you might try memory mapped files.  That is what the OS uses to map exe images into ram.  User programs can create memory mapped files to share the same data across processes.

 

Where it may help you is the call MapViewOfFile which lets you map a chunk of the file into a memory range.  With some experience, or if you can find a library for Memory Mapped Files, you may use that call to create a window into the file.  The position in the file used to fill the buffer is adjustable.  Also the size.

 

Edited by MilesAhead
fix typo
Link to comment
Share on other sites

I wonder why it takes 3 hours to process 1 million lines of a text file. How do you process the lines? Do you write the result to Excel, Word ..?
Maybe your script could be enhanced to run much faster :huh:
 

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...