Jump to content

Find most recurring word within a file?


Gui
 Share

Recommended Posts

Essentially my goal is to retrieve the most popular word (or most repeated word) within in list of data, such as inside a text file. Any already super efficient ways of accomplishing this? Thanks

GUI

Link to comment
Share on other sites

Hi,

A quick reply because someone is maybe building you a snippet of what you want :

Split the file into words, then go through words and add them into an array if they have not been added yet, otherwise increment the subindex.

Br, FireFox.

Hmm, thanks I think that'll do the trick! I'll just have to split everything into words.
Link to comment
Share on other sites

Correct, and for doing so you need first to precisely define what "word" means to you in the problem at hand.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Something like this here?

#include <Array.au3>

$sText = 'AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting.' & @CRLF &  _
                    'It uses a combination of simulated keystrokes, mouse movement and window/control manipulation in order to automate tasks in a way not possible or reliable with other languages (e.g. VBScript and SendKeys).' & @CRLF &  _
                    'AutoIt is also very small, self-contained and will run on all versions of Windows out-of-the-box with no annoying "runtimes" required!'

$aTest = MostRepeatedWords($sText)
_ArrayDisplay($aTest)

Func MostRepeatedWords($sText)
    Local $aSplit = StringRegExp($sText, "(w+)", 3)
    Local $aUnique = _ArrayUnique($aSplit)
    Local $aResult[UBound($aUnique)][2], $i, $c, $aTmp
    For $i = 1 To $aUnique[0]
        $aResult[$i][0] = $aUnique[$i]
        $aTmp = _ArrayFindAll($aSplit, $aUnique[$i], 0, 0, 2)
        $aResult[$i][1] = UBound($aTmp)
    Next
    $aResult[0][0] = $aUnique[0]
    _ArraySort($aResult, 1, 1, 0, 1)
    Return $aResult
EndFunc

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Don;t have time to refine it, off to a Huskers versus Buckeyes party, but here's a conceptual start:

#include <Array.au3>

Global $str = "Now is the time for all good men to come to the aid of their country"
Global $array = StringSplit($str, " ")
Global $count, $idx

For $x = 0 to UBound($array) - 1 ; count 'em up
    $y = "__" & $array[$x]
    If IsDeclared($y) Then
     $z = Eval($y) + 1
     If $z > $count Then $count = $z
     Assign($y, $z)
    Else
     Assign($y, 1)
EndIf
Next

For $x = 0 to UBound($array) - 1 ; crunch 'em
    If Eval("__" & $array[$x]) = $count Then
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $count
_ArrayDisplay($array)

A cleaned-up version:

#include <Array.au3>

Global $str = "I am the eggman, they are the eggmen, I am the walrus, coo coo ka choo."
$str = StringRegExpReplace($str, "[.,?]", "") ; remove punctuation
Global $array = StringSplit($str, " ")
Global $count = 1, $idx

For $x = 1 to $array[0] ; count 'em up, drop dupes
    $y = "__" & $array[$x]
    If IsDeclared($y) Then
     $z = Eval($y) + 1
     If $z > $count Then $count = $z
     Assign($y, $z)
    Else
     Assign($y, 1)
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $idx

$idx = 0
For $x = 1 to $array[0] ; pick the winners
    If Eval("__" & $array[$x]) = $count Then
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $idx
_ArrayDisplay($array, "occurs " & $count & " times")

I think the Assign() / IsDeclared() trick I first saw Yashied use in an alternate ArrayUnique() function is the closest thing to "super efficient" you'll find.

Edited by Spiff59
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...