jchd Posted May 11, 2011 Share Posted May 11, 2011 (edited) If I were you, I'd run it on a small extract (say 100Mb or so) to check output conformance, before launching the horse on the acual race lane. Edit: Oh yes, to save my ass from being off the chair, even if it not that much AutoItish, you probably need AutoIt to run it by passing your fine arguments on the command-line Forgot to mention how to use: > conv <input file> <output file> but as you probably see/know, it's easy to change that convention. Edit2: what Tvern just posted below is true. If the output format is correct, you're only slightly above 1/3 done. @Tvern, your counts seem wrong but the idea is there. Input lines are 49 chars, e.g. 00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15 (plus 2 chars CR LF) output lines are 130 chars: 000000000000000100000002000000030000000400000005000000060000000700000008000000090000000A0000000B0000000C0000000D0000000E0000000F (plus 2 chars CR LF) Edited May 11, 2011 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Tvern Posted May 11, 2011 Share Posted May 11, 2011 Correct me if I'm wrong, but wouldn't the output file be over tree times as big as the input file? The input file has 16 numbers, 7 comma's a CR and a LF on each line. (19 characters) The output file will have 8 x 8 numbers a CR and a LF on each line. (66 characters) As far I can see both files are ASCII. If I where you I'd pause the running script from the tray icon and try the one jhcd posted. Link to comment Share on other sites More sharing options...
Clark Posted May 12, 2011 Author Share Posted May 12, 2011 Oh dear. Yes, you are correct. This is going to run for a very long time, but I'm not going to stop it now. At the very least it will support my argument for a more powerful PC. Link to comment Share on other sites More sharing options...
Zedna Posted May 12, 2011 Share Posted May 12, 2011 (edited) Absolutely forgot about FileReadLine() if you need fast script!Instead read chunks (size for example 32KB) of data in cycleand parse/process lines of that chunk in that cycleand at the end of each cycle write output data from temporary variable to output filepseudo code:While True data = FileRead(file_in, 32000) If data = '' then ExitLoop ; end of file lines = StringSplit(data, @CRLF,1) output = '' For i = 1 to lines[0] output &= SomeConversion(lines[i]) & @CRLF ; process each line Next FileWrite(file_out, output) WendProbably you should also handle appending rest of last line in each chunk to next chunk --> because last line of each chunk can be cut and next part of that line can be in next chunk. But if your file have got all lines with the same length then you can use size of chunk to be multiple of this value to avoid cut of lines between chunks.EDIT:FileReadLine() is function which is in Autoit only for beginnersand advanced users should not use it in any situation (except of small files maybe)!!!It internally reads whole file from begin until it reaches given line number.So if number of lines is big then it's useless. Edited May 12, 2011 by Zedna Resources UDF Â ResourcesEx UDF Â AutoIt Forum Search Link to comment Share on other sites More sharing options...
jchd Posted May 12, 2011 Share Posted May 12, 2011 @Zedna,I fully agree with the spirit of your remark, but its phrasing may lead to confusion between FileReadLine with and without an explicit line number parameter. FileReadLine without line number parameter will start reading at current filepos, taking decent benefit of system/HDD cache and without much overhead compared to a bulk read (line processing + write time is likely to dominate in most use cases).Then using your exemple script on random real world text file will produce erroneous output because chunks of N bytes (N = any fixed value, 32000 or else) will rarely coincide with a line break, thus splitting the last line artificially most of the time.Since we have the chance that the OP has fixed 49-bytes lines, say we read 49000 bytes. But then again this will fail if the file is in some UTF encoding where number of bytes read may differ significantly from number of characters read. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Zedna Posted May 12, 2011 Share Posted May 12, 2011 @jchd 1) You are probably right about using FileReadLine with/without line parameter. Anyway in this particular case I think its bad idea to use it in any way. 2) Size of chunk can be increased for very large files. I think its optimal size should be compared/similar to size of CPU (and maybe also disk) cache so chunk can fit into L2/L3 cache of CPU and is processed quickly. Nowadays there are obvious CPU cache sizes in MegaBytes (2MB - 6MB ) so you can try to use for example 1 or 2 MB size of chunk for such big file. Resources UDF Â ResourcesEx UDF Â AutoIt Forum Search Link to comment Share on other sites More sharing options...
jchd Posted May 12, 2011 Share Posted May 12, 2011 I fully agree, but syncing line breaks with actual # bytes and (ANSI or UTF-*) chars read is far from obvious and certainly difficult to handle for beginners. Modern Windows do just like Unices and have more or less smart strategies for adaptative read-ahead caching, and HDD have similar performance optimizations. Hence I believe the burden of simple readline loop is not that much, knowing that processing time and write time may dominate. Finding and managing line breaks with AutoIt code in a few megs of buffer is certainly much slower than letting low-level code in C doing the job (FileReadLine is probably thinly mapped to the suitable call in C/C++ lib). Anyway, comparing simple-minded AutoIt code with simple-minded plain C throughputs shows that AutoIt is just not the tool of choice for heavy loads like this one. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Zedna Posted May 12, 2011 Share Posted May 12, 2011 (edited) In FileReadLine is hardcoded and probably too small chunk size so if you write your own reading/parsing with bigger chunk size you can minimize number of I/O operations (Read/Write to file). Edited May 12, 2011 by Zedna Resources UDF Â ResourcesEx UDF Â AutoIt Forum Search Link to comment Share on other sites More sharing options...
jchd Posted May 12, 2011 Share Posted May 12, 2011 True but the filesystem cache is standing there as well. All in all, the number of high-level reads probably doesn't matter that much. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Clark Posted May 13, 2011 Author Share Posted May 13, 2011 Interesting stuff. I can see some experimentation and trials will be required at the end of the current run. Link to comment Share on other sites More sharing options...
jchd Posted May 13, 2011 Share Posted May 13, 2011 Don't hold your breath until then This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now