Jump to content

Count number of words in a text?


Recommended Posts

  • Moderators

bleh,

Just one of the threads that appeared when I searched the forum. ;)

M23

P.S. The "Search" facility is at top right. ;)

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

What do you mean by "text"? A string with space separated words, a Word document ...?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Sorry, Melba23.

 

What do you mean by "text"? A string with space separated words, a Word document ...?

It can be a string or a text file or whatever. The text is a normal text with punctuation etc. Just like a Wikipedia entry for example.

Link to comment
Share on other sites

Search the forum for one of the word count examples as Melba suggested. Then decide which characters should denote a new word (space, hyphen, @CRLF, @LF ...).

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

I don't know how accurate this code is:

$sText = "GDI + allows you to easily manipulate 2D, without having to select from the dc, the pen, the font, so the brush can restore the DC to its original state before returning (it's hard enough with GDI). We can manipulate images, scale, rotate, translate, shear, or mix these functions quite easily. The pen has several functions to make indents. You can define preset tips for lines such as the tips of arrows or creating custom tips. The brushes are quite numerous and diverse, we can make very easily degraded by passing the color of departure and arrival."
MsgBox(0, "Test", $sText & @CRLF & @CRLF & "Number of words: " & CountWords($sText))

Func CountWords($sText)
    Local $aResult = StringRegExp($sText, "(\w+)", 3)
    If @error Then Return SetError(1, 0, 0)
    Return UBound($aResult)
EndFunc

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

According to the help file of StringRegExp when using "(w+)" a word is a consecutive number of this characters: a-z, A-Z, 0-9 or underscore (_)

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Let's see how the OP defines "word".

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Are these 3 words: " * * *."?

As there is neither a space nor an apostrophe I would say one...

What about  StringRegExp($sText, "[^s']+", 3) ?

Edit

assuming there is no typo or grammatical mistake in the text :D

Edited by mikell
Link to comment
Share on other sites

We didn't have a chance to see what "word" means to the OP. OTOH current PCRE implementation is compiled without PCRE_UCP, sadly (yes I do heavily insist on that). As a bad consequence, the easy way, b, can't be used in the general case.

Hopefully PCRE is kind enough to provide us with Unicode-wide h and v (and their negation, resp. H and V) to match horizontal and vertical "spaces" but the problem now shifts to detecting punctuation, which would require UCP support to be of general use.

So once again we're stuck with half-backed solutions which work for (?i)[a-z] English only despite AutoIt claiming to support Unicode (which it doesn't really).

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Let's see how the OP defines "word".

 

"A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence"

I don't know whats unclear. I want to count the number of words in texts using AutoIt.

For example: The quote above are seven words. The punctuation (" and .) would not count as a word.

The link of Melba23 seems to work fine, btw.

Thanks to you all.

 

Edited by bleh
Link to comment
Share on other sites

how about StringSplit with the delimiters string being a combination of common separators,

like space & @TAB  & @CR & @CRLF & @LF and whatever.

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

"A single distinct meaningful element of speech or writing, used with others (or sometimes alone) to form a sentence"

I don't know whats unclear. I want to count the number of words in texts using AutoIt.

If that's the definition you're going to use, then you're in deeper problem than you seem to think.

Please apply your own word count definition to the following translations of the very same (modulo Google translate accuracy!) sentence, all of them matching your definition.

"This is a well-formed sentence that meets the definition." (english)

"Voilà une phrase bien formée qui répond à la définition." (french)

"一個結構完整的句子符合定義" (traditional chinese)

"זהו משפט בנוי היטב שעונה על ההגדרה." (hebrew)

"นี่คือประโยคที่ดีขึ้นที่ตรงตามคำนิยาม" (thaï)

"これは、定義を満たす整形文です。" (japanese)

"இந்த வரையறையை சந்திக்கிறது என்று ஒரு நன்கு வடிவமைக்கப்பட்ட சொற்றொடர் உள்ளது." (tamul)

... and so many others I won't bother to list, but you get the idea.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Seriously? Why pick it apart? What's the point?

Here you go:

I want to count the words of formatted, punctuated and well written texts based on the latin script.

What's the "Latin script"? http://en.wikipedia.org/wiki/Latin_script

What's a "word"? https://en.wikipedia.org/wiki/Word

What's "punctuation"? http://en.wikipedia.org/wiki/Punctuatin

"What does a text with words look like?" Like this: One, two and three.

"But how many words would that be?" 4.

Link to comment
Share on other sites

  • Moderators

bleh,

 

Why pick it apart? What's the point?

Because we get so many people who ask a question and then complain about proposed solutions becasue they do not meet the special cases that they omitted to mention at the beginning. ;)

Thank you for explaining clearly what it you are asking. Does the thread to which I linked you in post #2 not do what you need? :huh:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Does the thread to which I linked you in post #2 not do what you need?

 

From my ealier reply (#13):

The link of Melba23 seems to work fine, btw.

Thanks to you all.

Edited by bleh
Link to comment
Share on other sites

Well, if by latin script you intend a-z and A-Z, then yes. But your ironic answer is the first time where "latin" was used.

So much for precision.

Please recognize that your previous wordings were overly ambiguous. I wasn't picky, just trying to read your mind at a distance.

This forum receives posts from worldwide users so unless a precise context is clear, answers should be as general as possible.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Well, maybe you should read before you reply. Since everyone with common sense could understand what i meant with words and text after reading what i wrote. Especially the example in #13 and even more so after i wrote that Melbas example worked.

Please recognize that it's also save to assume that i would have said it if i wanted to count words in traditional Chinese, Egyptian hieroglyphs or Martian.

Edited by bleh
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...