Sign in to follow this  
Followers 0
Kerros

Looking for a script Speed increase

16 posts in this topic

I'm looking for a way to speed up replacing text in a file to make it more readable, and allow for another script that someone else wrote to work correctly with the file. Most of the text is in a csv format, but the current release of software decided to include a separator between each line, maybe useful for some ppl, but messes up the other program.

In a small file there are 52,000 lines with 1/2 of them being the line separator. In a larger file there are 300,000+ lines.

What I currently have is a text file that looks similar to this:

CODE

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

<

< File Type: Sample viewer text file

< Company: Software company name here

< Software: Version number of software

< Create Date: Nov-12-07 on Monday at 15:29:42

<

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Time Stamp,Relative Time,Delta Time,Port,Frame,Command,Junk,Junk,Junk,Speed,Status,Error Output,

====================================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

===========================================================================

7.330.488.506 (s),0 (ns),0 (ns),H1,COMRESET,,,,,3 G,,,

====================================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

===========================================================================

7.340.412.746 (s),9.924.240 (ms),9.924.240 (ms),D1,COMINIT,,,,,3 G,,,

====================================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

===========================================================================

7.340.415.453 (s),2.706 (us),2.706 (us),H1,COMWAKE,,,,,3 G,,,

====================================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

===========================================================================

7.340.417 (s),1.546 (us),1.546 (us),D1,COMWAKE,,,,,3 G,,,

====================================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

===========================================================================

What I am currently doing is this:

CODE

$newfilename = Sample

$file = FileOpen(@scriptdir & $newFilename & ".txt", 0)

$csvfile = @scriptdir & $newFilename & ".csv"

_FileCreate($csvfile)

While 1

$line = FileReadLine($file)

If @error = -1 Then ExitLoop

If $line = "===============================================================================================

====================================================================================================

====================================================================================================

====================================================================================================

================================================================================" Then

Else

FileWriteLine($csvfile, $line)

EndIf

WEnd

FileClose($file)

While this works it seems to take awhile. I've thought about trying to using array's but the part that takes the longest is writing back to the file.

anyone have a suggestion?

<== Sample file if anyone is interested in trying it out.

Thanks

Kerros


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Ways that come to mind that would likely be faster.

Preprocess the file to strip the separator lines.

Inquire of the vendor if there is a 'legacy mode' switch that would generate the file the 'old way'.

Use a faster string comparision for your logic-- i.e. if StringLeft($line,1) = "=" then ....

read the file in its entirety, then perform the looping / logic through file contents.

If you know that you only need odd, or even lines, ignore the ones you don't want.

edit: If you know that the rightmost char of valid lines is always your comma, then key off that:

if stringright($line,1) = "," then _dostuff()
Edited by flyingboz

Reading the help file before you post... Not only will it make you look smarter, it will make you smarter.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Flyingboz Thank you for your reply.

This is the preprocess for the file, it's exported from the vendors software to the text file, and then I run this to make the csv, and then run the next script.

We did talk to the vendor awhile back asking them to get rid of the line separator's, they said that they would look at it for a future release, but that hasn't materialized. Time to talk to them again I guess.

I'll try the different types of string comparisons again, but when I originally did this script awhile back I did some timer trials, and there really didn't seem to be any differences.

For example I just did the old way three times for an average of 53 secs on the 53,000 line file. Using StringRight I got 52 sec for the same file. 1 second improvement over the old way. So I will make this change, but I was really hoping for more improvement.

As for reading in the file all at once, I don't think AutoIt can handle 300,000+ lines in an array that I know of. If there is a UDF or something that can handle this many lines I would love to know about it.

As the file does always follow the same format I could read just the lines I needed, but the it is not always the even or odd lines that contain the text. I could figure out where the text I need starts and then just read ever other line with an error check to make sure I'm getting the correct text. I have to play around with this one.

Edited by Kerros

Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

While this works it seems to take awhile. I've thought about trying to using array's but the part that takes the longest is writing back to the file.

anyone have a suggestion?

Thanks

Kerros

Maybe... _FileWriteFromArray()

#include<Array.au3>
#include<[color="#ffffff"]File[/color].au3>
Dim $a_Test
; Read [color="#ffffff"]file[/color] into  array
_FileReadToArray("test.txt",$a_Test)
; reverse  records
_ArrayReverse($a_Test,1)
; [color="#ffffff"]write[/color] reversed array to  [color="#ffffff"]file[/color]
_FileWriteFromArray("test2.txt",$a_Test,1)

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

I'm looking for a way to speed up replacing text in a file to make it more readable, and allow for another script that someone else wrote to work correctly with the file. Most of the text is in a csv format, but the current release of software decided to include a separator between each line, maybe useful for some ppl, but messes up the other program.

In a small file there are 52,000 lines with 1/2 of them being the line separator. In a larger file there are 300,000+ lines.

What I currently have is a text file that looks similar to this:

{snip}

While this works it seems to take awhile. I've thought about trying to using array's but the part that takes the longest is writing back to the file.

anyone have a suggestion?

I put your sample text in Test.txt and ran this:

#include <array.au3>
$sFile = "C:\Temp\Test.txt"
$Timer = TimerInit()
$avData = StringSplit(StringRegExpReplace(FileRead($sFile), "={3,}\r\n", ""), @CRLF, 1)
$Timer = Round(TimerDiff($Timer) / 1000, 3)
_ArrayDisplay($avData, "Time = " & $Timer & "sec")

Seems fairly quick: 0.001sec for your 8KB sample file. Try it on your files and see what kind of times you get.

:P


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

I put your sample text in Test.txt and ran this:

#include <array.au3>
$sFile = "C:\Temp\Test.txt"
$Timer = TimerInit()
$avData = StringSplit(StringRegExpReplace(FileRead($sFile), "={3,}\r\n", ""), @CRLF, 1)
$Timer = Round(TimerDiff($Timer) / 1000, 3)
_ArrayDisplay($avData, "Time = " & $Timer & "sec")

Seems fairly quick: 0.001sec for your 8KB sample file. Try it on your files and see what kind of times you get.

:P

Thanks PsaltyDS

This seems to be what I was looking for, Cut down my sample file with 52,000 lines from 55 secs to 1 sec. ;)

I'll try this on some other files and see if it works for everything.

Kerros


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

I put your sample text in Test.txt and ran this:

#include <array.au3>
$sFile = "C:\Temp\Test.txt"
$Timer = TimerInit()
$avData = StringSplit(StringRegExpReplace(FileRead($sFile), "={3,}\r\n", ""), @CRLF, 1)
$Timer = Round(TimerDiff($Timer) / 1000, 3)
_ArrayDisplay($avData, "Time = " & $Timer & "sec")oÝ÷ Ù'ÇÚ¹rªèM4ÖÇ~ò¢êü(^~)^N¼¢¶ò¢êßW¬jwlyì!jÙ"Ú¶)³*.ëªê-xw«z+¦êÞEèÆ÷«²*'¡óìj[r
+®*m¦èºÝtªèêèl]¬yÊ'vÇè¯]4ÓM%w¬¡ü¨º·Zµ¦®¶­sdvÆö&Âb33c´FFÒb33²b33°¤vÆö&Âb33c´6ÆVæVDFFÒb33²b33°¤vÆö&Âb33c´çWDfÆRÒgV÷C´3¢b3#¶FFb3#µ6×ÆRçGBgV÷C°¤vÆö&Âb33c´÷WGWDfÆRÒgV÷C´3¢b3#¶FFb3#´6ÆVæVBçGBgV÷C° ¢b33c´6ÆVæVDFFÒ7G&æu&VtW&WÆ6RfÆU&VBb33c´çWDfÆRÂgV÷C²b3#¶ãÒ¢b3#·"gV÷C²Âb33²b33²¤fÆUw&FRb33c´÷WGWDfÆRÂb33c´6ÆVæVDFF

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

Thank you for your suggestion, and it does work well on most of the files. I started debugging with one of the largest files we have which when converted has 544654 lines of good text, so there is 1 Million lines with all the junk in there.

I ran the conversion script on this file and received a memory allocation error. So I'm guessing that Autoit isn't allocating as much memory as needed for this large of a file. The original file size is 312MB

I was looking through the help file and I noticed _MemGlobalAlloc Which I'm guessing would be where I need to go to increase the size of memory that autoit uses. My question is: What is the default limit of memory that autoit uses so I will only allocate memory if the file size is larger then the limit.

Thanks

Kerros


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

Thank you for your suggestion, and it does work well on most of the files. I started debugging with one of the largest files we have which when converted has 544654 lines of good text, so there is 1 Million lines with all the junk in there.

I ran the conversion script on this file and received a memory allocation error. So I'm guessing that Autoit isn't allocating as much memory as needed for this large of a file. The original file size is 312MB

I was looking through the help file and I noticed _MemGlobalAlloc Which I'm guessing would be where I need to go to increase the size of memory that autoit uses. My question is: What is the default limit of memory that autoit uses so I will only allocate memory if the file size is larger then the limit.

Thanks

Kerros

The contents of the 312MB file are being read into a single string variable for parsing. In AutoIt, a string can be up to 2GB, and an array can have 16M elements. Both of those limits in AutoIt have been tested before, and you are not pushing it.

I have no idea where your failure occurred, but it may have had more to do with the long sting input to the RegExp or another function than just the string itself.

:P


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Thank you for your suggestion, and it does work well on most of the files. I started debugging with one of the largest files we have which when converted has 544654 lines of good text, so there is 1 Million lines with all the junk in there.

I ran the conversion script on this file and received a memory allocation error. So I'm guessing that Autoit isn't allocating as much memory as needed for this large of a file. The original file size is 312MB

...

I think you should track down the developer who thought it was a good idea to add all those unnecessary lines of '===', presumably because they thought it looked good, and make them remove every one of them by using 'C:\Windows\System32\edit.com' and when thay have finished ask them if they still think it was a good idea. :P

Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

I think you should track down the developer who thought it was a good idea to add all those unnecessary lines of '===', presumably because they thought it looked good, and make them remove every one of them by using 'C:\Windows\System32\edit.com' and when thay have finished ask them if they still think it was a good idea. ;)

Give him the printout on 11x17 tractor-feed green-striped paper... and a gallon jug of White-Out.

:P


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Give him the printout on 11x17 tractor-feed green-striped paper... and a gallon jug of White-Out.

:P

I would LOVE to get my hands on that guy, and string him up by something left to the imagination.

We had asked them why they put them there in the first place, and he said it was for readability. Who in their right mind is going to be reading all this junk in that form. I don't even like reading it after it's parsed, let alone before hand.


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

I'm back working on this script after being away from it for a week. I'm still receiving an "Error Allocating memory" even if I just I run this on my test file without any other parts of the script running.

CODE

$sfile = 'C:\temp\input.txt'

$cleaneddata = StringRegExpReplace(FileRead($sfile),"\n=*\r",'')

The file that I am trying to open is 906MB. If someone is willing to try converting the file I'll upload to an FTP site. Zipped the file is only 25MB.


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

Did you try using a command line tool to do the replacement?

http://www.regular-expressions.info/powergrep.html

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

findstr is a program that comes with Windows XP....

findstr /v =.* Sample.txt > Cleaned.txt

Does this give you the desired output? If this is not fast enough, you could try:

findstr /v ^= Sample.txt > Cleaned.txt

although not as precise. How fast does this run on your large data file?

findstr /v ^=.*$ Sample.txt might be the most accurate.

-John

Edited by jftuga

Share this post


Link to post
Share on other sites

Before I started scripting the conversion process with Autoit, we did use a findstr batch file to remove the extra lines. While this is quicker then what I originally scripted, it was the main reason I started looking for a way to increase the speed of my script using a pure autoit solution, not having to call an external program unless I really had to.

The findstr example that I use is

Findstr /i /v /c:"=====" %1 >> %2

I did some time trials with the same files using all three methods, RegEx, Findstr, and my process. RegEx was the quickest followed by FindStr and then my method. I'll take a look at PowerGREP and see if I can't get that to work for me. Thanks for all your suggestions.

Kerros


Kerros===============================================================How to learn scripting: Figure out enough to be dangerous, then ask for assistance.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0