Jump to content
Sign in to follow this  
Gui

Find most recurring word within a file?

Recommended Posts

Gui

Essentially my goal is to retrieve the most popular word (or most repeated word) within in list of data, such as inside a text file. Any already super efficient ways of accomplishing this? Thanks

GUI

Share this post


Link to post
Share on other sites
FireFox

Hi,

A quick reply because someone is maybe building you a snippet of what you want :

Split the file into words, then go through words and add them into an array if they have not been added yet, otherwise increment the subindex.

Br, FireFox.

  • Like 1

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites
Gui

Hi,

A quick reply because someone is maybe building you a snippet of what you want :

Split the file into words, then go through words and add them into an array if they have not been added yet, otherwise increment the subindex.

Br, FireFox.

Hmm, thanks I think that'll do the trick! I'll just have to split everything into words.

Share this post


Link to post
Share on other sites
jchd

Correct, and for doing so you need first to precisely define what "word" means to you in the problem at hand.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
UEZ

Something like this here?

#include <Array.au3>

$sText = 'AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting.' & @CRLF &  _
                    'It uses a combination of simulated keystrokes, mouse movement and window/control manipulation in order to automate tasks in a way not possible or reliable with other languages (e.g. VBScript and SendKeys).' & @CRLF &  _
                    'AutoIt is also very small, self-contained and will run on all versions of Windows out-of-the-box with no annoying "runtimes" required!'

$aTest = MostRepeatedWords($sText)
_ArrayDisplay($aTest)

Func MostRepeatedWords($sText)
    Local $aSplit = StringRegExp($sText, "(w+)", 3)
    Local $aUnique = _ArrayUnique($aSplit)
    Local $aResult[UBound($aUnique)][2], $i, $c, $aTmp
    For $i = 1 To $aUnique[0]
        $aResult[$i][0] = $aUnique[$i]
        $aTmp = _ArrayFindAll($aSplit, $aUnique[$i], 0, 0, 2)
        $aResult[$i][1] = UBound($aTmp)
    Next
    $aResult[0][0] = $aUnique[0]
    _ArraySort($aResult, 1, 1, 0, 1)
    Return $aResult
EndFunc

Br,

UEZ

  • Like 2

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
Spiff59

Don;t have time to refine it, off to a Huskers versus Buckeyes party, but here's a conceptual start:

#include <Array.au3>

Global $str = "Now is the time for all good men to come to the aid of their country"
Global $array = StringSplit($str, " ")
Global $count, $idx

For $x = 0 to UBound($array) - 1 ; count 'em up
    $y = "__" & $array[$x]
    If IsDeclared($y) Then
     $z = Eval($y) + 1
     If $z > $count Then $count = $z
     Assign($y, $z)
    Else
     Assign($y, 1)
EndIf
Next

For $x = 0 to UBound($array) - 1 ; crunch 'em
    If Eval("__" & $array[$x]) = $count Then
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $count
_ArrayDisplay($array)

A cleaned-up version:

#include <Array.au3>

Global $str = "I am the eggman, they are the eggmen, I am the walrus, coo coo ka choo."
$str = StringRegExpReplace($str, "[.,?]", "") ; remove punctuation
Global $array = StringSplit($str, " ")
Global $count = 1, $idx

For $x = 1 to $array[0] ; count 'em up, drop dupes
    $y = "__" & $array[$x]
    If IsDeclared($y) Then
     $z = Eval($y) + 1
     If $z > $count Then $count = $z
     Assign($y, $z)
    Else
     Assign($y, 1)
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $idx

$idx = 0
For $x = 1 to $array[0] ; pick the winners
    If Eval("__" & $array[$x]) = $count Then
     $idx += 1
     $array[$idx] = $array[$x]
    EndIf
Next
Redim $array[$idx + 1]
$array[0] = $idx
_ArrayDisplay($array, "occurs " & $count & " times")

I think the Assign() / IsDeclared() trick I first saw Yashied use in an alternate ArrayUnique() function is the closest thing to "super efficient" you'll find.

Edited by Spiff59

Share this post


Link to post
Share on other sites
czardas

I have created a word frequency stats function, but I haven't turned it into a UDF that I can quickly post. It's a follow up to I thought perhaps the link it might be helpful.

Edited by czardas

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×