Clark Posted May 6, 2011 Share Posted May 6, 2011 Hi all, Pretty self explanatory really. Can this code be made to run faster? Notes I have built it to an exe. I have put the exe in the same directory as the input and output files. While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop for $i = 1 to 16 step 3 $iNum = StringMid($line,$i,2) $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd Link to comment Share on other sites More sharing options...
sahsanu Posted May 6, 2011 Share Posted May 6, 2011 Can this code be made to run faster? Could you please show your entire script?. I suppose you are using filenames instead of file handles but without check the entire script.... Link to comment Share on other sites More sharing options...
Clark Posted May 6, 2011 Author Share Posted May 6, 2011 Sure. Here you go. expandcollapse popup; Open up the input and output files ====== $message = "Please select a file to convert" $initialDir = "W:\Testing\Tools\Diehard" $InputFile = FileOpenDialog($message, $initialDir & "\", "All (*.*)", 1 ) If @error Then MsgBox(4096,"","No File(s) chosen") exit Else $InputFile = StringReplace($InputFile, "|", @CRLF) EndIf $message="Please select a file to write to" $OutputFile = FileSaveDialog($message, $initialDir & "\", "All (*.*)", 2 ) If @error Then MsgBox(4096,"","No File(s) chosen") exit Else $OutputFile = StringReplace($OutputFile, "|", @CRLF) EndIf ; === end of file management ============== ; process the file ======================== $hFile = FileOpen($InputFile) $j=1 $intCntr = 0 ProgressOn("Progress Meter", "Converting file...", "0 percent",1,1,16) While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop for $i = 1 to 16 step 3 $iNum = StringMid($line,$i,2) $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd ProgressSet(100 , "Done", "Complete") Sleep(500) ProgressOff() FileClose($hFile) FileClose($OutputFile) Link to comment Share on other sites More sharing options...
sahsanu Posted May 6, 2011 Share Posted May 6, 2011 (edited) I think you have missed to FileOpen in write mode the $OutputFile ;-) $hOutputFile=FileOpen($OutputFile,1) ;use 1 open the file in write mode without erasing previous data. Take a look to FileOpen help to see the modes And then change your FileWrite/FileClose functions to use the handle ($hOutputFile) instead of the $OutputFile. Edited May 6, 2011 by sahsanu Link to comment Share on other sites More sharing options...
Clark Posted May 6, 2011 Author Share Posted May 6, 2011 (edited) Thanks for picking up that error. Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less). edit: I might try copying everything to local disk and see if that improves things. I don't think it will though. Edited May 6, 2011 by Clark Link to comment Share on other sites More sharing options...
hannes08 Posted May 6, 2011 Share Posted May 6, 2011 Hi all, Pretty self explanatory really. Can this code be made to run faster? Notes I have built it to an exe. I have put the exe in the same directory as the input and output files. While 1 $line = FileReadLine($hFile) if @error = -1 then ExitLoop $a_temp = StringSplit($line) for $i = 1 to 16 step 3 $iNum = $a_temp[$i] & $a_temp[$i+1] $oHex = Hex($iNum, 8) FileWrite($OutputFile, $oHex) next FileWrite ($OutputFile, @LF) WEnd How 'bout this? Lesser function calls (1 Splitstring instead of 5 StringMids) Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler] Link to comment Share on other sites More sharing options...
sahsanu Posted May 6, 2011 Share Posted May 6, 2011 Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less).That's really strange because I've just created a file with 400.000 lines, every line with 108 characters (44,3 MB) and using your same script and using handles it take 1 minute 44 seconds. Output file has +415.000 lines and 18 MB. Are you sure you put the handles correctly? Link to comment Share on other sites More sharing options...
bwochinski Posted May 6, 2011 Share Posted May 6, 2011 edit: I might try copying everything to local disk and see if that improves things. I don't think it will though.If you're saying that these files are on a network drive and not the local machine, that could make quite a difference. Link to comment Share on other sites More sharing options...
water Posted May 6, 2011 Share Posted May 6, 2011 To make your script faster you could read the whole file in one go to an array and then process the records in a loop. Please check the example script for _FileReadToArray in the help file. My UDFs and Tutorials: Spoiler UDFs:Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsOutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - DownloadOutlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - WikiPowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - WikiTask Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs:Excel - Example Scripts - WikiWord - Wiki Tutorials:ADO - WikiWebDriver - Wiki  Link to comment Share on other sites More sharing options...
Tvern Posted May 6, 2011 Share Posted May 6, 2011 (edited) As posted above, limiting read/write operations speeds things up. The result will be more dramatic when the read/write operations are slower. (which will be the case with a network drive) This reads the entire file at once, assembles an output string and then writes that all at once. a 6MB 300k line file takes 7 sec on my PC. #include <File.au3> Global $iTimer = TimerInit() Global $sOutPath = @ScriptDir & "\OutFile.txt" Global $sInPath = @ScriptDir & "\InFile.txt" Global $sOutFile, $aInFile _FileReadToArray($sInPath, $aInFile) ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF) For $iLine = 1 To $aInFile[0] For $i = 1 To 16 Step 3 $sOutFile &= Hex(StringMid($aInFile[$iLine],$i,2), 8) Next $sOutFile &= @LF Next ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF) FileWrite($sOutPath,$sOutFile) ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF) edit: forgot the example >.< Edited May 6, 2011 by Tvern Link to comment Share on other sites More sharing options...
jchd Posted May 6, 2011 Share Posted May 6, 2011 Maybe it can run even faster using a single (well almost!) regexp instead of the inner For loop: #include <File.au3> Global $iTimer = TimerInit() Global $sOutPath = @ScriptDir & "\OutFile.txt" Global $sInPath = @ScriptDir & "\InFile.txt" Global $sOutFile, $aInFile _FileReadToArray($sInPath, $aInFile) ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF) For $iLine = 1 To $aInFile[0] $sOutFile &= Execute(StringRegExpReplace($aInFile[$iLine], "(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).*", _ "Hex($1,8)&Hex($2,8)&Hex($3,8)&Hex($4,8)&Hex($5,8)&Hex($6,8)&Hex($7,8)&Hex($8,8)&" & _ "Hex($9,8)&Hex($10,8)&Hex($11,8)&Hex($12,8)&Hex($13,8)&Hex($14,8)&Hex($15,8)&Hex($16,8)&@LF")) Next ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF) FileWrite($sOutPath,$sOutFile) ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF) The difference shouldn't be much anyway. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Clark Posted May 9, 2011 Author Share Posted May 9, 2011 Thanks all for your input. I left the job processing over the weekend. It is almost finished (I think). Actually, I misread the size of the input file, it is 3,946,991kb. So that isn't 4Mb but 4Tb I am thinking. Anyway, after over 48 hours of processing my script output is up to 2,213,250kb. Given that it only converts ascii numbers to 8bit Hex numbers I am assuming the file size will remain relatively constant, so looks like it is on the home stretch. I think the size of the file will obviate the usefulness of reading in the entire file first? Let me know if you think different. I will try a few of the other ideas after the job has finished to see what effect they have. Thx again Clark Link to comment Share on other sites More sharing options...
jchd Posted May 9, 2011 Share Posted May 9, 2011 Ouch! Indeed 4G (no, not 4T) isn't the same game! You hit at least one limitation: help file under Arrays says "The total number of entries cannot be greater than 2^24 (16 777 216)." That means 805 Mb with 48-byte records, well below your input size. Then there is also a limit on Windows memory allowable to a single process. Thus forget about reading the whole baby at once. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Tvern Posted May 9, 2011 Share Posted May 9, 2011 So ehm.. discarding every third number was intentional right? If I where you I'd rewrite the conversion routine in C, have it accept a source and destination as parameters and write an AutoIt frontend. Write it multi threaded, or just split the file up in chunks and have multiple background processes. For a file this size it should make a pretty noticeable difference. Link to comment Share on other sites More sharing options...
Clark Posted May 9, 2011 Author Share Posted May 9, 2011 So ehm.. discarding every third number was intentional right?Ummm, no?Are you referring to the "step 3" in the code?That is because the input file is of the form:01,23,23,09,34,1234,01,78,23,04,66etc Link to comment Share on other sites More sharing options...
Tvern Posted May 9, 2011 Share Posted May 9, 2011 (edited) I should have written "every third character". Would have been a bummer if you found out the output you got so far was invalid. Edited May 9, 2011 by Tvern Link to comment Share on other sites More sharing options...
jchd Posted May 9, 2011 Share Posted May 9, 2011 Clark, What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF? Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Clark Posted May 10, 2011 Author Share Posted May 10, 2011 Clark,What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF?Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)?Exactly, the input file is exactly as I posted above (numbers 01 to 45 actually). Immune to all errors. Link to comment Share on other sites More sharing options...
jchd Posted May 10, 2011 Share Posted May 10, 2011 If so, please run a little benchmark with this executable. It converts a 128Mb test input file (exactly 2699865 input lines) in 8.61s most of which spent in I/O (user time = 0.015s). The code is _really_ bare bones. Feel free to add bells, whistles, Christmas decorations, comments, ... It can be made even faster by performing I/O in larger blocks and also by using less operations in conversions at the expense of a much larger lookup table. I compile it with gcc as "gcc conv.c -O2 -0 conv.exe" Convert.zip This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Clark Posted May 11, 2011 Author Share Posted May 11, 2011 Thanks very much for this I will need to wait for the current file conversion to complete before running your test. Otherwise the timing will not be fair. My output file is currently at 3,781,066kb, and my input file is 3,946,991kb, so hopefully this will be soon. It will be very interesting to see the results from your program. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now