Sign in to follow this  
Followers 0
The Kandie Man

Need More Efficient String Search

16 posts in this topic

#1 ·  Posted (edited)

EDIT This problem has been resolved. I have another new question here:

http://www.autoitscript.com/forum/index.ph...ndpost&p=135219

I am trying to search for a string in a text file. Right now i am using this format to search for a string in a file:

$lines = _FileCountLines($filepath)


For $line = 1 to $lines

$readlinestring = FileReadLine ( $filepath , $line )

$stringsearch = StringInStr ( $readlinestring, $searchquery )

If $stringsearch <> 0 Then

Filewriteline(@scriptdir & "\results.txt", $readlinestring)

Endif

Next

What it does (in case you don't understand what i am trying to do) is reads large text files and then returns the text of the line that contains the search query.

Right now i have text files with over 50 thousand lines of text and this is taking forever. I notice that when i attempt this in notepad(search for a string in my huge text files) it finds the bit of text in less than 1 second. I don't understand how it is doing this but i need to optimize this in the same way so that it can find text that is being searched for much faster.

I have tried many of the built in functions but they are too slow. Does anyone have any ideas on how i could speed up the search?

Edited by The Kandie Man

"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

You may find that reading the file once will improve speed:

Local $Data = FileRead($Path, FileGetSize($Path)
$Data = StringSplit($Data, @CRLF, 1)
Local $Output = ''
For $I = 1 To $Data[0]
    If StringInStr($Data[$I], $Search) Then $Output = $Output & $Data[$I] & @CRLF
Next
Local $Handle = FileOpen(@ScriptDir & '\Results.txt', 1)
If $Handle <> -1 Then
    FileWrite($Handle, $Output)
    FileClose($Handle)
EndIf

Edit: One more thing that I noticed is that since you supply a path to FileReadLine() instead of a handle, the file is being opened and closed each time you read a line. Obtaining a handle using FileOpen() may itself be a huge speed improvement to your code.

Edited by LxP

Share this post


Link to post
Share on other sites

Here's another approach which may be faster again:

Local $InFile = '...'
Local $OutFile = '...'
Local $Search = '...'

Local $InHandle = FileOpen($InFile, 0)
Local $OutHandle = FileOpen($OutFile, 1)
If $InHandle <> -1 And $OutFile <> -1 Then
    While 1
        Local $Line = FileReadLine($InHandle)
        If @Error Then ExitLoop
        If StringInStr($Line, $Search) Then FileWriteLine($OutHandle, $Line)
    WEnd
    FileClose($InHandle)
    FileClose($OutHandle)
EndIf

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Ok, i am going to try to use that. I have to modify the code first to fit my very uncondensed version of the above code.

Keep in mind that i am writing a program that not only reads a text file but also creates search results on a GUI as well as writes the search results to a text file.

EDIT

It works!! I am writing a search engine. Basically i have been working on a program that indexes the hard drives on your computer. Once your computer has been indexed you can search for any file on your computer by entering a part of the filename into the search. It will then return the results faster than the explorer search companion which is slow and a bit buggy in my opinion.

Edited by The Kandie Man

"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

$doscmd = @COMSPEC & ' /c find /i "' & $searchquery & '" "' & $filepath
    $doscmd = $doscmd & '" > "' & @SCRIPTDIR & '\results.txt"'
    RunWait( $doscmd , "", @SW_HIDE )

:P

Share this post


Link to post
Share on other sites

I wish i had asked you a couple hours ago how to remove lines in a file that are just equal signs.

[asdf]
=
=
[asdfgasdg]
=
[jhjlk]
=
=
=

Would be:

[asdf]
[asdfgasdg]
[jhjlk]

But i managed to find and modify a snippet. Though it took me like 15 min.

@Echo Off
Set B==
For /F "tokens=* delims=" %%A in (title.index) Do If NOT "%%A"=="%B%" Echo %%A >> title2.index

So i got it to work. :P


"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

If StringLeft($StringName, 1) <> '=' Then FileWriteLine()

(Assuming that the first character of the line you don't want is always going to be an '=' sign.


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Yeah, i was going to use AutoIt, but i am using the INIWrite function. You see, when i index the computer i use an INI format to keep track of the file and file locations. I use this format:

[pagefile.sys]
C:\=2147483648

Where the filename is in the section name, the location of the file is one of the keys(since files can have the same name in multiple locations, a different key for each location), and the number value is the size of the file in bytes.

I also have it create a title index file where only only the names of the files are written:

[pagefile.sys]
=

Please note that even though i don't input any keys or data for the keys the equal sign remains. So i created a bat file that deletes the equal signs once the my program has completed indexing the computer.

I was originally going to have it create the title index using the regular filewriteline, but it would write the same filename multiple times(if the same filename existed in different locations) slowing the search.

Any ideas on how to optimize my "indexing engine" as i call it would be appreciated.

I also want to note that the INIWrite function slows when it has to write a new key to a section that already exists in a large file. I say this because my index files can be 14MB of ini data and I am assuming the iniwrite function slows when it has to search for the sectioname to write the key data to in the file.

Edited by The Kandie Man

"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

$Split = StringSplit($LineNumber, '=')

If $Split[0] > 1 Then FileWriteLine($FilePath, $Split[2])

Edit: I'm re-reading your post, are you only wanting the [section names]? Because the above will only write the found information 'after' the '=' sign, not the section names at all.

Edited by SmOke_N

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Ok, I bet I sound a bit confusing. What I am making is a Search/Index Engine. What this program would do is have two options. One would be to search the computer (assuming that you have already created an index of all the files on your computer using the index option) or to index it. Right now I have the index engine search out all fixed drives. I then have a series of For loops that have it scan every file in every subdirectory on all the fixed drives. When the indexing engine finds a file, it writes the files name to the index file in an INI format where the section name is the files name(ex song.mp3), the key is the files location(ex C:\Documents and settings\<Username>\My Documents\My Music\), and the value of the key is equal to the value of the filesize in bytes. So in all, an indexed file would look like this:

[song.mp3]
C:\Documents and settings\<Username>\My Documents\My Music\=1654916

I have encountered only one problem. If the section name [song.mp3] already exists, then the INIWrite function takes a while to find the song.mp3 section name when it finds another song.mp3 located in another location on the drive.

It would take a while to add the key:

[song.mp3]
C:\Documents and settings\<Username>\My Documents\My Music\=1654916
C:\Backup\ My Music\=1654916

I dont know why this is, but I am pretty sure that it is because it is searching for the section name in the file. If the section name doesnt exist, it writes the information to the file instantly and uses 100% CPU because it is writing so many files so fast. Every now and then it hits a backup folder and it slows down to a crawl. It is really annoying. Do you guys know of any way I could somehow optimize this?


"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

I would suggest storing all of the information to an array while scanning the drives and then look into using standard file operations to assemble the INI file.

Share this post


Link to post
Share on other sites

Yes, that is a good idea, but i haven't the slightest idea on how i could make that work. Does anyone have an idea for setting an array so that it could hold the same information in the format i described above. I looked at the _FileListToArray but that isn't what i want. I want the filename held by one dimension of the array, the location to be held by another dimension of the array, and the filesize to be held by another dimension. I have no idea how to create an array like this. Does anyone have any ideas?


"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

This actually sounds like your 'Ini' File could be quite large, I don't know if you are aware that the IniRead() option only has a 64kb capacity? But it sounds like it doesn't 'have' to be an .ini to you, your just looking for some kind of mini database so to speak?


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

Yes, i am looking for a mini database that i could use to organize filenames and file locations.

EDIT

The INIRead at this time is working, however, it is the INIWrite that is giving me problems. It's really slow when the file it is writing to has lots of entries. I was hoping for a faster way to create an index file using the INIWrite function.

New question starts here:

http://www.autoitscript.com/forum/index.ph...ndpost&p=135219

Edited by The Kandie Man

"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

Try this code on for size. Calling AddEntry() involves passing a file path and a variable to which to append the data. The remainder of the code manipulates that collected data.

#Include <Array.au3>

Local $Input = ''
AddEntry('C:\Windows\Notepad.exe', $Input)
AddEntry('C:\Windows\System32\Calc.exe', $Input)

; Sort the input to aid outputting
$Input = StringSplit(StringTrimRight($Input, 1), @LF)
_ArraySort($Input, False, 1)

; Prepae output
Local $Output = ''
Local $CurrentFile = ''
For $I = 1 To $Input[0]
    Local $Record = StringSplit($Input[$I], @TAB)
    If $Record[1] <> $CurrentFile Then
        $Output &= '[' & $Record[1] & ']' & @LF
        $CurrentFile = $Record[1]
    EndIf
    $Output &= $Record[2] & '=' & $Record[3] & @LF
Next

; Test output
MsgBox(0, '', $Output)

Func AddEntry($Path, ByRef $Input)
; Split path into folder and file
    Local $LastSlashLoc = StringInStr($Path, '\', False, -1)
    If $LastSlashLoc = 0 Then
        SetError(1)
        Return False
    EndIf
    Local $Folder = StringLeft($Path, $LastSlashLoc)
    Local $File = StringTrimLeft($Path, $LastSlashLoc)
; Get size of file
    Local $Size = FileGetSize($Path)
    If @Error Then
        SetError(2)
        Return False
    EndIf
    $Input &= $File & @TAB & $Folder & @TAB & $Size & @LF
EndFunc

Warning: As with your previous method of using INIWrite(), expect problems if the folder path contains an equals sign.

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

Well i rewrote it so that i just does a blunt fileopen and then writes the full filenames to the index file. It works but with a reduction in speed because my previous method would read an ini section and could therefore get the location of multiple files with the same name instantly by simply reading the section. I like what you have done though. I will try it out. Thanks for all your help Alex. :P

Edited by The Kandie Man

"So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0