Jump to content

Can this be made to run faster?


Recommended Posts

Hi all,

Pretty self explanatory really.

Can this code be made to run faster?

Notes

I have built it to an exe.

I have put the exe in the same directory as the input and output files.

While 1
    $line = FileReadLine($hFile)
    if @error = -1 then ExitLoop
    for $i = 1 to 16 step 3                     
        $iNum = StringMid($line,$i,2)
        $oHex = Hex($iNum, 8)
        FileWrite($OutputFile, $oHex)
    next
    FileWrite ($OutputFile, @LF)    
WEnd
Link to comment
Share on other sites

Sure. Here you go.

; Open up the input and output files ======
$message = "Please select a file to convert"
$initialDir = "W:\Testing\Tools\Diehard"
$InputFile = FileOpenDialog($message, $initialDir & "\", "All (*.*)", 1 )
If @error Then
    MsgBox(4096,"","No File(s) chosen")
    exit
Else
    $InputFile = StringReplace($InputFile, "|", @CRLF)
EndIf

$message="Please select a file to write to"
$OutputFile = FileSaveDialog($message, $initialDir & "\", "All (*.*)", 2 )
If @error Then
    MsgBox(4096,"","No File(s) chosen")
    exit
Else
    $OutputFile = StringReplace($OutputFile, "|", @CRLF)
EndIf
; === end of file management ==============


; process the file ========================
$hFile = FileOpen($InputFile)
$j=1    
$intCntr = 0
ProgressOn("Progress Meter", "Converting file...", "0 percent",1,1,16)
While 1
    $line = FileReadLine($hFile)
    if @error = -1 then ExitLoop
    for $i = 1 to 16 step 3                     
        $iNum = StringMid($line,$i,2)
        $oHex = Hex($iNum, 8)
        FileWrite($OutputFile, $oHex)
    next
    FileWrite ($OutputFile, @LF)    
WEnd
ProgressSet(100 , "Done", "Complete")
Sleep(500)
ProgressOff()
FileClose($hFile)
FileClose($OutputFile)
Link to comment
Share on other sites

I think you have missed to FileOpen in write mode the $OutputFile ;-)

$hOutputFile=FileOpen($OutputFile,1) ;use 1 open the file in write mode without erasing previous data. Take a look to FileOpen help to see the modes

And then change your FileWrite/FileClose functions to use the handle ($hOutputFile) instead of the $OutputFile.

Edited by sahsanu
Link to comment
Share on other sites

Thanks for picking up that error.

Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less).

edit: I might try copying everything to local disk and see if that improves things. I don't think it will though.

Edited by Clark
Link to comment
Share on other sites

Hi all,

Pretty self explanatory really.

Can this code be made to run faster?

Notes

I have built it to an exe.

I have put the exe in the same directory as the input and output files.

While 1
    $line = FileReadLine($hFile)
    if @error = -1 then ExitLoop
        $a_temp = StringSplit($line)
    for $i = 1 to 16 step 3                     
        $iNum = $a_temp[$i] & $a_temp[$i+1]
        $oHex = Hex($iNum, 8)
        FileWrite($OutputFile, $oHex)
    next
    FileWrite ($OutputFile, @LF)    
WEnd

How 'bout this? Lesser function calls (1 Splitstring instead of 5 StringMids)

Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler]
Link to comment
Share on other sites

Unfortunately it is still quite slow. It has a 4meg file to process, and is only processing about 20k a minute (or maybe less).

That's really strange because I've just created a file with 400.000 lines, every line with 108 characters (44,3 MB) and using your same script and using handles it take 1 minute 44 seconds. Output file has +415.000 lines and 18 MB. Are you sure you put the handles correctly?

Link to comment
Share on other sites

edit: I might try copying everything to local disk and see if that improves things. I don't think it will though.

If you're saying that these files are on a network drive and not the local machine, that could make quite a difference.

Link to comment
Share on other sites

To make your script faster you could read the whole file in one go to an array and then process the records in a loop.

Please check the example script for _FileReadToArray in the help file.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

As posted above, limiting read/write operations speeds things up. The result will be more dramatic when the read/write operations are slower. (which will be the case with a network drive)

This reads the entire file at once, assembles an output string and then writes that all at once. a 6MB 300k line file takes 7 sec on my PC.

#include <File.au3>
Global $iTimer = TimerInit()
Global $sOutPath = @ScriptDir & "\OutFile.txt"
Global $sInPath = @ScriptDir & "\InFile.txt"
Global $sOutFile, $aInFile
_FileReadToArray($sInPath, $aInFile)
ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF)
For $iLine = 1 To $aInFile[0]
    For $i = 1 To 16 Step 3
        $sOutFile &= Hex(StringMid($aInFile[$iLine],$i,2), 8)
    Next
    $sOutFile &= @LF
Next
ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF)
FileWrite($sOutPath,$sOutFile)
ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF)

edit: forgot the example >.<

Edited by Tvern
Link to comment
Share on other sites

Maybe it can run even faster using a single (well almost!) regexp instead of the inner For loop:

#include <File.au3>
Global $iTimer = TimerInit()
Global $sOutPath = @ScriptDir & "\OutFile.txt"
Global $sInPath = @ScriptDir & "\InFile.txt"
Global $sOutFile, $aInFile
_FileReadToArray($sInPath, $aInFile)
ConsoleWrite(TimerDiff($iTimer) & " - File read. (" & Round(FileGetSize($sInPath)/1048576,2) & "MB - " & $aInFile[0] & " lines)" & @CRLF)
For $iLine = 1 To $aInFile[0]
    $sOutFile &= Execute(StringRegExpReplace($aInFile[$iLine], "(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).(..).*", _
                            "Hex($1,8)&Hex($2,8)&Hex($3,8)&Hex($4,8)&Hex($5,8)&Hex($6,8)&Hex($7,8)&Hex($8,8)&" & _
                            "Hex($9,8)&Hex($10,8)&Hex($11,8)&Hex($12,8)&Hex($13,8)&Hex($14,8)&Hex($15,8)&Hex($16,8)&@LF"))
Next
ConsoleWrite(TimerDiff($iTimer) & " - Output string ready." & @CRLF)
FileWrite($sOutPath,$sOutFile)
ConsoleWrite(TimerDiff($iTimer) & " - Done." & @CRLF)

The difference shouldn't be much anyway.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks all for your input.

I left the job processing over the weekend. It is almost finished (I think).

Actually, I misread the size of the input file, it is 3,946,991kb. So that isn't 4Mb but 4Tb I am thinking. Anyway, after over 48 hours of processing my script output is up to 2,213,250kb. Given that it only converts ascii numbers to 8bit Hex numbers I am assuming the file size will remain relatively constant, so looks like it is on the home stretch.

I think the size of the file will obviate the usefulness of reading in the entire file first? Let me know if you think different.

I will try a few of the other ideas after the job has finished to see what effect they have.

Thx again

Clark

Link to comment
Share on other sites

Ouch! Indeed 4G (no, not 4T) isn't the same game! You hit at least one limitation: help file under Arrays says "The total number of entries cannot be greater than 2^24 (16 777 216)." That means 805 Mb with 48-byte records, well below your input size. Then there is also a limit on Windows memory allowable to a single process. Thus forget about reading the whole baby at once.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

So ehm.. discarding every third number was intentional right?

If I where you I'd rewrite the conversion routine in C, have it accept a source and destination as parameters and write an AutoIt frontend. Write it multi threaded, or just split the file up in chunks and have multiple background processes.

For a file this size it should make a pretty noticeable difference.

Link to comment
Share on other sites

So ehm.. discarding every third number was intentional right?

Ummm, no?

Are you referring to the "step 3" in the code?

That is because the input file is of the form:

01,23,23,09,34,12

34,01,78,23,04,66

etc

Link to comment
Share on other sites

Clark,

What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF?

Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Clark,

What is the exact format of your file? Is it always series of 16 2-ASCII digits (numbers 00 to 99) followed by a comma except the 16th, followed by CR LF?

Is the format stable over time and immune to fomatting errors (adding checks would easily double processing time compared to "blind" code)?

Exactly, the input file is exactly as I posted above (numbers 01 to 45 actually). Immune to all errors.

Link to comment
Share on other sites

If so, please run a little benchmark with this executable. It converts a 128Mb test input file (exactly 2699865 input lines) in 8.61s most of which spent in I/O (user time = 0.015s).

The code is _really_ bare bones. Feel free to add bells, whistles, Christmas decorations, comments, ...

It can be made even faster by performing I/O in larger blocks and also by using less operations in conversions at the expense of a much larger lookup table.

I compile it with gcc as "gcc conv.c -O2 -0 conv.exe"

Convert.zip

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks very much for this

I will need to wait for the current file conversion to complete before running your test. Otherwise the timing will not be fair.

My output file is currently at 3,781,066kb, and my input file is 3,946,991kb, so hopefully this will be soon. :unsure:

It will be very interesting to see the results from your program.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...