Jump to content

More Efficient Tool to Search File for String [Resolved]


ken82m
 Share

Recommended Posts

I have a current script that processes spool files. One of the first things it does is search the file for two strings containing the username and title, which I use later in the script and for a log file I append to.

I just use the windows "Find" command for this

However the engineering department recently upgraded there software and has been generating very large spool files (300-500MB's sometimes). Now I know no matter what I do this is going to take some time, however I thought someone might now of a tool that might to it a little faster.

I know from experience in the past that just by using a different utility you can cut a lot of time off of a process it just depends on how the tools were written I guess.

Thanks,

Kenny

Edited by ken82m

 "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Link to comment
Share on other sites

...I just use the windows "Find" command for this...

Please post your code or a small sample/test script that shows what you are using. I'm not positive what you mean by 'windows "Find"'. Is that a GUI interface to the "DOS find"? Would the "DOS Find" be faster?

[size="1"][font="Arial"].[u].[/u][/font][/size]

Link to comment
Share on other sites

actually I just found the real problem (still involves find :P

This section of code:

$EOF = ""

$EOF = RunWait(@SYSTEMDIR & '\find "%%EOF" ' & $PDFRoot & $Source,"",@SW_HIDE)

If $EOF = 1 Then

Sleep(1000)

EndIf

Until $EOF = 0

It is searching the entire file for %%EOF which means the windows print spooler is done and the script can process the file.

Does anyone know how to retrieve the numer of lines in a file so I can just read the one line instead of searching the whole file?

Thanks,

Kenny

Edited by ken82m

 "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Link to comment
Share on other sites

actually I just found the real problem (still involves find :lmao:

This section of code:

$EOF = ""

$EOF = RunWait(@SYSTEMDIR & '\find "%%EOF" ' & $PDFRoot & $Source,"",@SW_HIDE)

If $EOF = 1 Then

Sleep(1000)

EndIf

Until $EOF = 0

It is searching the entire file for %%EOF which means the windows print spooler is done and the script can process the file.

Does anyone know how to retrieve the numer of lines in a file so I can just read the one line instead of searching the whole file?

Thanks,

Kenny

Check out _FileCountLines() in the help file.

Hope this helps,

JPC :P

Link to comment
Share on other sites

Hi,

i think there are examples in my link (in signature) to DosCOM.au3;

I know someone was able to delete lines matching certain strings using only one read of a 500 Mb file, for instance (45 secs or similar);

Check the line number retrieval component in "findstr" in dOS too..

Let me know..

Randall

Edited by randallc
Link to comment
Share on other sites

I found a tool form the Windows 2003 Resource Kit called TAIL.EXE

It starts reading files from the end instead of the beginning, I have to output the result to a file and then read it. But this code will process that big file in 2 or 3 seconds versus minutes with the find command.

I did try the _FileCountLines it was faster but it still took a while at hogged the cpu for a while.

Thanks for the help everyone

$PDFRoot = "C:\CreatePDF\"
Func Rename($1)
$Random = Random(100, 900)
$Random = StringLeft($Random, 3)
$SearchFile = "Search" & $RANDOM & ".OUT"
$Source = $SourceFile & $1 & ".PSF"
$ToConvert = "ToConvert" & $Random & ".PSF"
$Converted = "ConvertedPDF" & $Random & ".PDF"
Sleep(500)

Do
$EOF = ""
$EOF2 = ""
FileDelete($SearchFile)
RunWait(@ComSpec & ' /C "' & $PDFRoot & "tail -2 " & $PDFRoot & $Source & ">" & $SearchFile & '"', @ScriptDir)
$EOF = FileReadLine($SearchFile, 1)
$EOF2 = FileReadLine($SearchFile, 2)
FileDelete($SearchFile)
If $EOF <> "%%EOF" And $EOF2 <> "%%EOF" Then Sleep(2000)
Until $EOF = "%%EOF" OR $EOF2 = "%%EOF"
Edited by ken82m

 "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Link to comment
Share on other sites

I found a tool form the Windows 2003 Resource Kit called TAIL.EXE

It starts reading files from the end instead of the beginning, I have to output the result to a file and then read it. But this code will process that big file in 2 or 3 seconds versus minutes with the find command.

I did try the _FileCountLines it was faster but it still took a while at hogged the cpu for a while.

Thanks for the help everyone

$PDFRoot = "C:\CreatePDF\"
Func Rename($1)
$Random = Random(100, 900)
$Random = StringLeft($Random, 3)
$SearchFile = "Search" & $RANDOM & ".OUT"
$Source = $SourceFile & $1 & ".PSF"
$ToConvert = "ToConvert" & $Random & ".PSF"
$Converted = "ConvertedPDF" & $Random & ".PDF"
Sleep(500)

Do
$EOF = ""
$EOF2 = ""
FileDelete($SearchFile)
RunWait(@ComSpec & ' /C "' & $PDFRoot & "tail -2 " & $PDFRoot & $Source & ">" & $SearchFile & '"', @ScriptDir)
$EOF = FileReadLine($SearchFile, 1)
$EOF2 = FileReadLine($SearchFile, 2)
FileDelete($SearchFile)
If $EOF <> "%%EOF" And $EOF2 <> "%%EOF" Then Sleep(2000)
Until $EOF = "%%EOF" OR $EOF2 = "%%EOF"
there are a few ways you could try to do this faster w/ AutoIT.

1) if there are constant strings that identify the username and title in the file, you could just read in the entire file, then use stringinstr() to find what you're looking for...

$GinormousString = FileRead("c:\BigArseFile.txt",FileGetSize("C:\BigArseFile.txt"))
$un = StringInStr($GinormousString,"Username: ");this finds the constant that signals the username is the next item
$crap = StringRight($GinormousString,StringLen($GinormousString)-$un);this grabs everything after the constant 
$username = StringLeft($crap,StringInStr($crap," ");this grabs the whole word before the first space, regardless of length of the username...

2) instead of the FileCountLines, maybe check out _FileReadToArray(). You could have the file assigned to an array, and then the last line would be $array[$array[0]-1]. You could also at that point use _ArraySearch() to locate your constant strings throughout the array, or write your own search function to search through the array. If you decide to go that route though, don't bother with a recursive search because AutoIt only allows 384 levels of recursive calls, and with a file that size, you'll probably need more than that...

Link to comment
Share on other sites

Cool I'll give that a try, I hadn't even noticed all those new user functions in the help file :P

Hey as a backup one more question.

I was doing all this along with some other updates due to the problems processing some of the larger files. Once in a while the conversion program I use would just go wild and sit there processing a file getting nowhere forever until I manually kill it.

The script in this thread rename the spool file and write the information to a txt file where it will be queued up for processing. Another script is looking every once in a while and processing files in the queue.

I made two seperate scripts because I want to be able to grab these spool files as quickly as possible so windows doesn't decide to overwrite it.

Anyway the script grabs the file from the queue, calls a conversion program and performs a few more functions. Then everything starts again.

Anyway to the question-

Currently I do a RunWait, was thinking of doing a Run, and polling the CPU Usage and CPU TIME or running time. Just CPU Usage would probably work if it came down to it.

Anyway if it exceeds ceartain thresholds I would just kill it, delete the source file and everything would naturally start processing the next file.

Now I know Run will return the process ID, so any suggestions on collecting the CPU Usage etc...?

Thanks,

Kenny

Edited by ken82m

 "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Link to comment
Share on other sites

Cool I'll give that a try, I hadn't even noticed all those new user functions in the help file :P

Hey as a backup one more question.

I was doing all this along with some other updates due to the problems processing some of the larger files. Once in a while the conversion program I use would just go wild and sit there processing a file getting nowhere forever until I manually kill it.

The script in this thread rename the spool file and write the information to a txt file where it will be queued up for processing. Another script is looking every once in a while and processing files in the queue.

I made two seperate scripts because I want to be able to grab these spool files as quickly as possible so windows doesn't decide to overwrite it.

Anyway the script grabs the file from the queue, calls a conversion program and performs a few more functions. Then everything starts again.

Anyway to the question-

Currently I do a RunWait, was thinking of doing a Run, and polling the CPU Usage and CPU TIME or running time. Just CPU Usage would probably work if it came down to it.

Anyway if it exceeds ceartain thresholds I would just kill it, delete the source file and everything would naturally start processing the next file.

Now I know Run will return the process ID, so any suggestions on collecting the CPU Usage etc...?

Thanks,

Kenny

i don't have anything off hand, but i think if you check out scripts and scraps, atleast one of the system info tools will have a cpu monitor, just check out the code, and use what you need.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...