Sign in to follow this  
Followers 0
myspacee

find how many time a word is repeated in a lot files

3 posts in this topic

hello to all,

for work i must search how many time a word is call in a large amout of txt files.

this is part of my list:

pavia
milano
roma
firenze
torino
venezia
pistoia
catania
sassari

Search suggest on scripting this :

On given directory:

- program take first value [pavia]

- open first file and search for [pavia]

- if found write log [pavia +1]. if not found skip to next file.

- when files are over, restart whit second list entry [milano]

- and go on until finish list.

if [word] is contained more than 1 times, i must count only 1 time.

ex:

Pavia is a small city. But Pavia was important in italy history.

find 2 times [Pavia] but i must count it as only 1 find.

Need some idea how to store info, and functions i must use to do this kind of research.

Thank you all for your time,

m.

Share this post


Link to post
Share on other sites



StringReplace - "...the number of replacements performed is stored in @extended."

Replacing your current word with itself will give you how many times the word was used.

Share this post


Link to post
Share on other sites

think something right this:

create folder '\01_IN_txt\'

create file 'nomicom.txt' in script dir with wanted word

#include <file.au3>
#include <array.au3>
#include <GuiConstantsEx.au3>
#include <WindowsConstants.au3>


$file_log = FileOpen(@ScriptDir & "\LOG.txt", 10)

;~  $sourceFolder = $split_source_Path[1] & $split_source_Path[2]
$sourceFolder = @ScriptDir & "\01_IN_txt\*.txt"

; check filenames of specified path
$search = FileFindFirstFile($sourceFolder) 
; Check if the search was successful
If $search = -1 Then
    ConsoleWrite("No files matched the search pattern." & @CRLF)
    Exit
EndIf


;Gather files into an array
$fileList = _FileListToArray(@ScriptDir & "\01_IN_txt\", "*.txt", 1)

;Loop through array
For $X = 1 to $fileList[0]

;read file
    $foo = FileOpen (@ScriptDir & "\01_IN_txt\" & $fileList[$X], 0)
    $bar = FileRead ($foo)
    
;initialize line counter
    $line_number = 0
    
    while 1
    ;skip to next line
        $line_number = $line_number + 1
        $line = FileReadLine(@ScriptDir & "\nomicom.txt", $line_number)
        If @error = -1 Then ExitLoop;if error file is over
        
    ;use fake stringreplace trick;)
        $text = StringReplace($bar, $line, $line)
        $numreplacements = @extended

    ;if find something log
        if $numreplacements <> 0 Then
            FileWriteLine($file_log, $fileList[$X] & "---> [Search for : " & $line & " ]  [Find : " & $numreplacements & "]" & @CRLF)
        EndIf
        
    WEnd

;close file and skip to Next
    FileClose ($foo)

Next

FileClose ($file_log)

script read all .txt file and search for single list entry.

m.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0