Jump to content

Can this be made to run faster?


Recommended Posts

If I were you, I'd run it on a small extract (say 100Mb or so) to check output conformance, before launching the horse on the acual race lane.

Edit: Oh yes, to save my ass from being off the chair, even if it not that much AutoItish, you probably need AutoIt to run it by passing your fine arguments on the command-line :unsure:

Forgot to mention how to use:

> conv <input file> <output file>

but as you probably see/know, it's easy to change that convention.

Edit2: what Tvern just posted below is true. If the output format is correct, you're only slightly above 1/3 done.

@Tvern, your counts seem wrong but the idea is there.

Input lines are 49 chars, e.g.

00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15 (plus 2 chars CR LF)

output lines are 130 chars:

000000000000000100000002000000030000000400000005000000060000000700000008000000090000000A0000000B0000000C0000000D0000000E0000000F (plus 2 chars CR LF)

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Correct me if I'm wrong, but wouldn't the output file be over tree times as big as the input file?

The input file has 16 numbers, 7 comma's a CR and a LF on each line. (19 characters)

The output file will have 8 x 8 numbers a CR and a LF on each line. (66 characters)

As far I can see both files are ASCII.

If I where you I'd pause the running script from the tray icon and try the one jhcd posted.

Link to comment
Share on other sites

Absolutely forgot about FileReadLine() if you need fast script!

Instead read chunks (size for example 32KB) of data in cycle

and parse/process lines of that chunk in that cycle

and at the end of each cycle write output data from temporary variable to output file

pseudo code:

While True
  data = FileRead(file_in, 32000)
  If data = '' then ExitLoop ; end of file
  lines = StringSplit(data, @CRLF,1)
  output = ''
  For i = 1 to lines[0]
    output &= SomeConversion(lines[i]) & @CRLF ; process each line
  Next
  FileWrite(file_out, output)
Wend

Probably you should also handle appending rest of last line in each chunk to next chunk --> because last line of each chunk can be cut and next part of that line can be in next chunk. But if your file have got all lines with the same length then you can use size of chunk to be multiple of this value to avoid cut of lines between chunks.

EDIT:

FileReadLine() is function which is in Autoit only for beginners

and advanced users should not use it in any situation (except of small files maybe)!!!

It internally reads whole file from begin until it reaches given line number.

So if number of lines is big then it's useless.

Edited by Zedna
Link to comment
Share on other sites

@Zedna,

I fully agree with the spirit of your remark, but its phrasing may lead to confusion between FileReadLine with and without an explicit line number parameter. FileReadLine without line number parameter will start reading at current filepos, taking decent benefit of system/HDD cache and without much overhead compared to a bulk read (line processing + write time is likely to dominate in most use cases).

Then using your exemple script on random real world text file will produce erroneous output because chunks of N bytes (N = any fixed value, 32000 or else) will rarely coincide with a line break, thus splitting the last line artificially most of the time.

Since we have the chance that the OP has fixed 49-bytes lines, say we read 49000 bytes. But then again this will fail if the file is in some UTF encoding where number of bytes read may differ significantly from number of characters read.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@jchd

1) You are probably right about using FileReadLine with/without line parameter. Anyway in this particular case I think its bad idea to use it in any way.

2) Size of chunk can be increased for very large files. I think its optimal size should be compared/similar to size of CPU (and maybe also disk) cache so chunk can fit into L2/L3 cache of CPU and is processed quickly. Nowadays there are obvious CPU cache sizes in MegaBytes (2MB - 6MB ) so you can try to use for example 1 or 2 MB size of chunk for such big file.

Link to comment
Share on other sites

I fully agree, but syncing line breaks with actual # bytes and (ANSI or UTF-*) chars read is far from obvious and certainly difficult to handle for beginners.

Modern Windows do just like Unices and have more or less smart strategies for adaptative read-ahead caching, and HDD have similar performance optimizations. Hence I believe the burden of simple readline loop is not that much, knowing that processing time and write time may dominate.

Finding and managing line breaks with AutoIt code in a few megs of buffer is certainly much slower than letting low-level code in C doing the job (FileReadLine is probably thinly mapped to the suitable call in C/C++ lib).

Anyway, comparing simple-minded AutoIt code with simple-minded plain C throughputs shows that AutoIt is just not the tool of choice for heavy loads like this one.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

True but the filesystem cache is standing there as well. All in all, the number of high-level reads probably doesn't matter that much.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Don't hold your breath until then :unsure:

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...