Clark Posted May 6, 2011 Posted May 6, 2011 Hi all, Pretty self explanatory really. Can this code be made to run faster? Notes I have built it to an exe. I have put the exe in the same directory as the input and output files. While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop for $i = 1 to 16 step 3 $iNum = StringMid($line,$i,2) $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd
sahsanu Posted May 6, 2011 Posted May 6, 2011 Can this code be made to run faster? Could you please show your entire script?. I suppose you are using filenames instead of file handles but without check the entire script....
Clark Posted May 6, 2011 Author Posted May 6, 2011 Sure. Here you go. expandcollapse popup; Open up the input and output files ====== $message = "Please select a file to convert" $initialDir = "W:\Testing\Tools\Diehard" $InputFile = FileOpenDialog($message, $initialDir & "\", "All (*.*)", 1 ) If @error Then MsgBox(4096,"","No File(s) chosen") exit Else $InputFile = StringReplace($InputFile, "|", @CRLF) EndIf $message="Please select a file to write to" $OutputFile = FileSaveDialog($message, $initialDir & "\", "All (*.*)", 2 ) If @error Then MsgBox(4096,"","No File(s) chosen") exit Else $OutputFile = StringReplace($OutputFile, "|", @CRLF) EndIf ; === end of file management ============== ; process the file ======================== $hFile = FileOpen($InputFile) $j=1 $intCntr = 0 ProgressOn("Progress Meter", "Converting file...", "0 percent",1,1,16) While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop for $i = 1 to 16 step 3 $iNum = StringMid($line,$i,2) $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd ProgressSet(100 , "Done", "Complete") Sleep(500) ProgressOff() FileClose($hFile) FileClose($OutputFile)
sahsanu Posted May 6, 2011 Posted May 6, 2011 (edited) I think you have missed to FileOpen in write mode the $OutputFile ;-) $hOutputFile=FileOpen($OutputFile,1) ;use 1 open the file in write mode without erasing previous data. Take a look to FileOpen help to see the modes And then change your FileWrite/FileClose functions to use the handle ($hOutputFile) instead of the $OutputFile. Edited May 6, 2011 by sahsanu
Clark Posted May 6, 2011 Author Posted May 6, 2011 (edited) Thanks for picking up that error. Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less). edit: I might try copying everything to local disk and see if that improves things. I don't think it will though. Edited May 6, 2011 by Clark
hannes08 Posted May 6, 2011 Posted May 6, 2011 Hi all, Pretty self explanatory really. Can this code be made to run faster? Notes I have built it to an exe. I have put the exe in the same directory as the input and output files. While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop $a_temp = StringSplit($line) for $i = 1 to 16 step 3 $iNum = $a_temp[$i] & $a_temp[$i+1] $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd How 'bout this? Lesser function calls (1 Splitstring instead of 5 StringMids) Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler]
sahsanu Posted May 6, 2011 Posted May 6, 2011 Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less).That's really strange because I've just created a file with 400.000 lines, every line with 108 characters (44,3 MB) and using your same script and using handles it take 1 minute 44 seconds. Output file has +415.000 lines and 18 MB. Are you sure you put the handles correctly?
bwochinski Posted May 6, 2011 Posted May 6, 2011 edit: I might try copying everything to local disk and see if that improves things. I don't think it will though.If you're saying that these files are on a network drive and not the local machine, that could make quite a difference.
water Posted May 6, 2011 Posted May 6, 2011 To make your script faster you could read the whole file in one go to an array and then process the records in a loop. Please check the example script for _FileReadToArray in the help file. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Â
Tvern Posted May 6, 2011 Posted May 6, 2011 (edited) As posted above, limiting read/write operations speeds things up. The result will be more dramatic when the read/write operations are slower. (which will be the case with a network drive) This reads the entire file at once, assembles an output string and then writes that all at once. a 6MB 300k line file takes 7 sec on my PC. #include <File.au3> Global $iTimer = TimerInit() Global $sOutPath = @ScriptDir & "\OutFile.txt" Global $sInPath = @ScriptDir & "\InFile.txt" Global $sOutFile, $aInFile _FileReadToArray($sInPath, $aInFile) ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF) For $iLine = 1 To $aInFile[0] For $i = 1 To 16 Step 3 $sOutFile &= Hex(StringMid($aInFile[$iLine],$i,2), 8) Next $sOutFile &= @LF Next ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF) FileWrite($sOutPath,$sOutFile) ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF) edit: forgot the example >.< Edited May 6, 2011 by Tvern
jchd Posted May 6, 2011 Posted May 6, 2011 Maybe it can run even faster using a single (well almost!) regexp instead of the inner For loop: #include <File.au3> Global $iTimer = TimerInit() Global $sOutPath = @ScriptDir & "\OutFile.txt" Global $sInPath = @ScriptDir & "\InFile.txt" Global $sOutFile, $aInFile _FileReadToArray($sInPath, $aInFile) ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF) For $iLine = 1 To $aInFile[0] $sOutFile &= Execute(StringRegExpReplace($aInFile[$iLine], "(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).*", _ "Hex($1,8)&Hex($2,8)&Hex($3,8)&Hex($4,8)&Hex($5,8)&Hex($6,8)&Hex($7,8)&Hex($8,8)&" & _ "Hex($9,8)&Hex($10,8)&Hex($11,8)&Hex($12,8)&Hex($13,8)&Hex($14,8)&Hex($15,8)&Hex($16,8)&@LF")) Next ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF) FileWrite($sOutPath,$sOutFile) ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF) The difference shouldn't be much anyway. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Clark Posted May 9, 2011 Author Posted May 9, 2011 Thanks all for your input. I left the job processing over the weekend. It is almost finished (I think). Actually, I misread the size of the input file, it is 3,946,991kb. So that isn't 4Mb but 4Tb I am thinking. Anyway, after over 48 hours of processing my script output is up to 2,213,250kb. Given that it only converts ascii numbers to 8bit Hex numbers I am assuming the file size will remain relatively constant, so looks like it is on the home stretch. I think the size of the file will obviate the usefulness of reading in the entire file first? Let me know if you think different. I will try a few of the other ideas after the job has finished to see what effect they have. Thx again Clark
jchd Posted May 9, 2011 Posted May 9, 2011 Ouch! Indeed 4G (no, not 4T) isn't the same game! You hit at least one limitation: help file under Arrays says "The total number of entries cannot be greater than 2^24 (16 777 216)." That means 805 Mb with 48-byte records, well below your input size. Then there is also a limit on Windows memory allowable to a single process. Thus forget about reading the whole baby at once. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Tvern Posted May 9, 2011 Posted May 9, 2011 So ehm.. discarding every third number was intentional right? If I where you I'd rewrite the conversion routine in C, have it accept a source and destination as parameters and write an AutoIt frontend. Write it multi threaded, or just split the file up in chunks and have multiple background processes. For a file this size it should make a pretty noticeable difference.
Clark Posted May 9, 2011 Author Posted May 9, 2011 So ehm.. discarding every third number was intentional right?Ummm, no?Are you referring to the "step 3" in the code?That is because the input file is of the form:01,23,23,09,34,1234,01,78,23,04,66etc
Tvern Posted May 9, 2011 Posted May 9, 2011 (edited) I should have written "every third character". Would have been a bummer if you found out the output you got so far was invalid. Edited May 9, 2011 by Tvern
jchd Posted May 9, 2011 Posted May 9, 2011 Clark, What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF? Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Clark Posted May 10, 2011 Author Posted May 10, 2011 Clark,What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF?Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)?Exactly, the input file is exactly as I posted above (numbers 01 to 45 actually). Immune to all errors.
jchd Posted May 10, 2011 Posted May 10, 2011 If so, please run a little benchmark with this executable. It converts a 128Mb test input file (exactly 2699865 input lines) in 8.61s most of which spent in I/O (user time = 0.015s). The code is _really_ bare bones. Feel free to add bells, whistles, Christmas decorations, comments, ... It can be made even faster by performing I/O in larger blocks and also by using less operations in conversions at the expense of a much larger lookup table. I compile it with gcc as "gcc conv.c -O2 -0 conv.exe" Convert.zip This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Clark Posted May 11, 2011 Author Posted May 11, 2011 Thanks very much for this I will need to wait for the current file conversion to complete before running your test. Otherwise the timing will not be fair. My output file is currently at 3,781,066kb, and my input file is 3,946,991kb, so hopefully this will be soon. It will be very interesting to see the results from your program.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now