Jump to content

Search content of multiple files for multiple strings


Recommended Posts

Hi all,

I have been using AutoIt on and off for automating little things for a while now, but I'm by no means an expert. Usually I pick up what I need from the posts here on the forum, but just now I have to build something that's a little more complex than I'm used to and I need some advice.

Here's what I need to do:

I will have a text file (c:\strings.txt) which contains a list of values I want to search for, each on a different line:

value1

value2

value3

etc

Next I want to build a script that will search every file in the folder the script is run in (or any folder specified) for each of the search values in the text file above, then display the list of search values with a count beside it of the total number of instances found across all files searched. The files I will be searching are .sql files.

The search file might include up to 100 values to search for, and the folder containing the files being searched might have 1000 files, so things could take a hell of a long time and I need to ensure I write it as efficiently as possible. One potential workaround is that I could select only the .sql files to search that are up to 3 months old - that would at least narrow it down a little.

Can any of you offer some advice on available functions, advice on best performance for such a script, and/or any other pointers to get me going?

Thanks a million!

Le Kant

Link to comment
Share on other sites

In fact, I've just put this together... The skeleton I guess.

#include <WindowsConstants.au3>
#Include <File.au3>
#Include <Array.au3>

Global $classes
Global $counter
Local  $searchfile
Local  $file

; Shows the filenames of all files in the current directory
$search = FileFindFirstFile('*.sql')  

If Not _FileReadToArray('C:\Documents and Settings\me\Desktop\Autoit Scripts\classes.txt',$classes) Then
   MsgBox(4096,"Class Counter has fallen over", 'Error reading class names. Check classes.txt     Error:' & @error)
   Exit
EndIf

; Check if the search was successful
If $search = -1 Then
    MsgBox(0, 'Error', 'No files/directories matched the search pattern')
    Exit
EndIf

While 1
    $file = FileFindNextFile($search) ; Move through each file in the folder
    If @error Then ExitLoop
    
        $counter = 1  ; Repeat the whole process for however many classes there are
        $LoopEnd = UBound($classes)

        Do

            $currentclass = $classes[$counter] ; Use the next class in the array
            $bigstring = FileRead($file)
            $result = StringRegExp($bigstring,$currentclass,3)
            MsgBox(0,'Found class','Found ' & $currentclass & ' in ' & $file)
        
        $counter = $counter + 1
        
        Until $counter = $LoopEnd; Next class, back to start of do while loop
    
WEnd

; Close the search handle
FileClose($search)

I need to somehow collate the number of instances of each string found on each pass and store them with the string value instead of displaying a msgbox each time like I'm doing here to test, i.e.

string1 234

string2 123

string3 352

For display at the end of the process.

I just discovered another thing too. I will need to be able to search sub-directories, as well as the one the script is in...

Sorry guys, thinking aloud!

Edited by LeKant
Link to comment
Share on other sites

Is there any chance the values you're searching are ambiguous, like looking for a string in a specific context?

Example of hard to find string: table as an SQL keyword in the following SQL statement: 'create table "my table" (...);'

If so you need to go thru an SQL parser and that will make the job much harder.

I left editing this reply, being distracted elsewhere. I now see that the search is easy and files are not too big and fit in memory. Fine. There are a large number of example of subdir recursion in the help forum or example scripts, use search feature to find working code.

To accumulate hit counts, just build a 2D array of size "number of values to search" holding the value and the count, then accumulate UBound($result) when $result is an array (i.. if matches are found) in the corresponding array row.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Is there any chance the values you're searching are ambiguous, like looking for a string in a specific context?

Example of hard to find string: table as an SQL keyword in the following SQL statement: 'create table "my table" (...);'

If so you need to go thru an SQL parser and that will make the job much harder.

I left editing this reply, being distracted elsewhere. I now see that the search is easy and files are not too big and fit in memory. Fine. There are a large number of example of subdir recursion in the help forum or example scripts, use search feature to find working code.

To accumulate hit counts, just build a 2D array of size "number of values to search" holding the value and the count, then accumulate UBound($result) when $result is an array (i.. if matches are found) in the corresponding array row.

Yes jchd, it's simple table names I'm looking for rather than keywords.

Thanks for the advice on the hit counts, that's the bit I'm just about to try and tackle. It's probably not the cleanest code but if it works I'll be happy enough! I'll have to look more into the subdir recursion - I've searched and saw a few things about using DOS with comspec but not sure if I understand it all.

Anyway, in regard to my code above - it doesn't work. I used StringRegExp when I just needed StringInStr. Here it is fixed up. I've also added the bit to look only at files less than 3 months old.

$search = FileFindFirstFile(@ScriptDir & '\*.sql') ; Shows the filenames of all files in the current directory

If Not _FileReadToArray('C:\classes.txt',$classes) Then
   MsgBox(4096,"Class Counter has fallen over", 'Error reading class names. Check classes.txt     Error:' & @error)
   Exit
EndIf

If $search = -1 Then ; Check if the search was successful
    MsgBox(0, 'Error', 'No files/directories matched the search pattern')
    Exit
EndIf

While 1
    $file = FileFindNextFile($search) ; Move through each file in the folder
    If @error Then ExitLoop
        
        $filetime = FileGetTime($file, 0, 1) ; Get the time of the file    
        If $filetime > _DateAdd('m', -3, _NowCalcDate()) Then ; If the file is less than 3 months old continue
        
        $counter  = 1 
        $loopend  = UBound($classes)
        $bigstring    = FileRead($file) ; Read the contents of the file into a string

            Do

                $currentclass = $classes[$counter] ; Use the next class in the array
                $result       = StringInStr($bigstring, $currentclass) ; Search for the class in the file string
                
                If $result <> 0 Then
                MsgBox(0,'Current Class','Found ' & $currentclass & ' in ' & $file) ; Test with MsgBox
            EndIf
            
            $counter = $counter + 1
            
            Until $counter = $loopend; Next class, back to start of do while loop
        
        EndIf
WEnd

; Close the search handle
FileClose($search)
Link to comment
Share on other sites

Chime again if you hit a wall.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Chime again if you hit a wall.

Cool, cheers.

P.S. I notice you're in South of France.... was over there last week. South-west perchance? I'm hoping to move there for good very very soon...

Link to comment
Share on other sites

Half way between Dax and Bayonne in département des Landes. Endless pinetree forests, corn to feed horses, 15km from ocean, 1h from Pyrénnées mountains, 30' from Spain for low-VAT food, fags, gas and other goods: call that a misery. You're welcome to settle any time.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Half way between Dax and Bayonne in département des Landes. Endless pinetree forests, corn to feed horses, 15km from ocean, 1h from Pyrénnées mountains, 30' from Spain for low-VAT food, fags, gas and other goods: call that a misery. You're welcome to settle any time.

And that's exactly why I'm going! That and the fact that my partner's from Bagneres-de-Bigorre but she's living here (Ireland) at the minute (very temporarily she informs me)....

Only problem is finding work, so if you've any tips!

Link to comment
Share on other sites

Job is the hard part. But since you're going to become fluent in AutIt in few moments, you'll have a better chance! That or guarding sheeps.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I think I might be fluent in AutoIt before I'm fluent in French!

Have a little problem here.... I made a few changes, then move from testing test data to testing real data and something is amiss...

#include <WindowsConstants.au3>
#Include <File.au3>
#Include <Array.au3>
#Include <Date.au3>
            
Global $classes, $paths

If Not _FileReadToArray(@ScriptDir & '\ClassList.ini',$classes) Then
   MsgBox(4096,'Error reading class names', 'Error ' & @error & ': ClassList.ini does not exist or is empty')
   Exit
EndIf

If Not _FileReadToArray(@ScriptDir & '\SearchPaths.ini',$paths) Then
   MsgBox(4096,'Error reading paths to search', 'Error ' & @error & ': SearchPaths.ini does not exist or is empty')
   Exit
EndIf

$loopend  = UBound($classes)
$endall   = UBound($paths)

Global $loopend, $bigstring, $filetime, $maincounter, $counter, $search, $currentclass, $searchfile, $file, $result
Global $output[$loopend][2]

$maincounter  = 1
Do ; Loop through each folder
    
    ToolTip('Searching ' & $paths[$maincounter] & '...')
    $search = FileFindFirstFile($paths[$maincounter] & '\*.sql') ; Shows the filenames of all files in the current directory
    
    If $search = -1 Then ; Check if the search was successful
        MsgBox(0, 'Error', 'No files/directories matched the search pattern')
        Exit
    EndIf
    
    While 1 ; Loop through each file

    $file = FileFindNextFile($search)
        If @error Then ExitLoop     

            $filetime = FileGetTime($file, 0, 1) ; Get the time of the file
            
            ;If $filetime > _DateAdd('m', -300, _NowCalcDate()) Then ; If the file is less than 3 months old continue
                $counter   = 1 
                $bigstring = FileRead($file) ; Read the contents of the file into a string
                    
                If $bigstring = '' Then ; Check if the file was read
                    MsgBox(0, 'Error', 'Cannot read contents of file: ' & $file & ' or file is empty.')
                Exit
                EndIf
                
                    Do ; Loop through each search string can see if it's in current file
                        $output[$counter][0] = $classes[$counter] ; Insert the full class list to the output array
                        
                        $currentclass = $classes[$counter] ; Assign the next class in the array
                        $result       = StringInStr($bigstring, $currentclass) ; Search for the class in the file contents string
                    ;MsgBox(0,$file,'Result for ' & $currentclass & ' in ' & $file & ' is ' & $result) ; TEST
                    
                        If $result <> 0 Then ; If the class is found...
                            $output[$counter][1] = $output[$counter][1] + 1 ; ...add 1 to the total for that class in the array
                        Else                 ; And if it isn't...
                            $output[$counter][1] = $output[$counter][1] + 0 ; ...add 0 to the total for that class in the array
                        EndIf
            
                    $counter = $counter + 1
                                
                    Until $counter = $loopend ;Next class, back to start of do while loop until none left in list
            ;EndIf
            
    WEnd ; While there are still files in this folder continue, else go to next folder

$maincounter = $maincounter + 1
    
Until $maincounter = $endall ; Next folder to search, back to start of do while loop until none left

_ArrayDisplay($output,"Output") ;Display the output array

; Close the search handle
FileClose($search)

There should be two ini files (ClassList.ini and SearchPaths.ini)in the same directory when running the script. One has the list of paths where the sql files are, the other has the list of strings to look for.

After some debugging it seems the line

$bigstring = FileRead($file)
is not reading the file. Is there something I've missed?

Been looking at it a while now and can't see the problem... :unsure:

Edit: I just noticed it's also not picking up the file date now either, I've commented that out above...

Edited by LeKant
Link to comment
Share on other sites

I think I've spotted the problem... and it's really odd (to me at least).

I've cut the code down to this for debugging:

#include <WindowsConstants.au3>
#Include <File.au3>
            
Global $bigstring, $search, $file, $folder

$search = FileFindFirstFile(@ScriptDir & '\*.txt')

    If $search = -1 Then
        MsgBox(0, 'Error', 'No files/directories matched the search pattern')
        Exit
    EndIf
    
    While 1

    $file = FileFindNextFile($search)
        If @error Then ExitLoop     

                $bigstring = FileRead($file)
                    
                If $bigstring = '' Then 
                    MsgBox(0, 'Error', 'Cannot read contents of file: ' & $file & ' or file is empty.')
                Exit
                EndIf
            MsgBox(0,'File Contents','Contents of ' & $file & ': ' & $bigstring)
    WEnd

FileClose($search)

The above works when there is a txt file in the script directory that has some words in it. But, if you change Line 6 to

$search = FileFindFirstFile(@ScriptDir & '\Test Folder\*.txt')
and create a 'Test Folder' in the script directory with another text file in it with contents, it will find the file but it will not be able to read it's contents....

Why is that!? It's driving me nuts.... Have I missed something really simple here?

I've also tried this:

$folder = @ScriptDir & '\Test Folder'
$search = FileFindFirstFile($folder & '\*.txt')

And this:

$folder = @ScriptDir & '\Test Folder\*.txt'
$search = FileFindFirstFile($folder)

HELP!!!! :unsure:

Edited by LeKant
Link to comment
Share on other sites

Sorry for delay couldn't do otherwise.

FileFindNextFile will return a filename, but you have to prepend the path to reach it with FileOpen.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Sorry for delay couldn't do otherwise.

FileFindNextFile will return a filename, but you have to prepend the path to reach it with FileOpen.

Wow.... I knew it had to be simple.

So after the line

$file = FileFindNextFile($search)
I added the line
$file = $folder & '\' & $file

And now it works perfectly.

Thanks a million mate.

Hopefully see you soon in the south of France!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...