Sign in to follow this  
Followers 0
sebgg

I need a List of All known english words.

22 posts in this topic

So just a little project I want to do, see the longest/most frequent word in a massive (several billion characters) list of seemingly random letters. so need alist of known words to compare to.

wondered if anyone has one compiled already or if anyone knows where i could get a hold of one?

Cheers,

Sebastian.


GC - Program to rapidly manipulate DNA SequencesRotaMol - Program to measure Protein Size

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

http://www.manythings.org/vocabulary/lists/l/

I imagine you may want to find a better way to store them... And I'd do a lot of reading into string search algorithms (the one I know of is by three guys, one called pratt). I'd also consider another language other than AutoIt (not something I suggest very often :D ). Given that the task is pretty simple, avoiding the overhead you'll get with AutoIt should be easy.

Edited by Mat

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

sebgg,

You cannot be arsed to search - why should we? :D

But you might want to click here. :oops:

M23

Edit: typo.

Edited by Melba23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Just follow the link to my website and you will find it on there in both zip and txt formats.

It's in the Miscellaneous section.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

sebgg,

You cannot be arsed to search - why should we? :D

But you might want to click here. :oops:

M23

Edit: typo.

because then i wouldnt have the chance to chat to all you lovely people!

thanks all for help. in the end went with a 17x,xxx word list space separated, perfect!

sebs


GC - Program to rapidly manipulate DNA SequencesRotaMol - Program to measure Protein Size

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

If there are only 170K of words it's no where near complete.

EDIT: Out of curiosity where did you find your word list (link)? Every time I find a new list I just merge it into mine. Sometimes I actually find a few words that are missing from my list.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

GEOSoft,

If there are only 170K of words it's no where near complete.

The list at your site has appx 121K words...what would you consider a complete list?

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

You could circumbobulate and obambulate around this topic, but it would be a case of acrasia. You are better off aucupating... Let's face it, any dictionary is going to be macilent, given that our language is motatorious.

It's all rather ostrobogulous... (pandiculates due to delassation). I imagine this would be a good topic for a deipnosophist though :D

1 person likes this

Share this post


Link to post
Share on other sites

Thank you, Mat, precicely what I was thinking.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

You could circumbobulate and obambulate around this topic, but it would be a case of acrasia. You are better off aucupating... Let's face it, any dictionary is going to be macilent, given that our language is motatorious.

It's all rather ostrobogulous... (pandiculates due to delassation). I imagine this would be a good topic for a deipnosophist though :D

i think my avatar pretty much summed up my face when i read this... Edited by Thornhunt

Budweiser + room = warm beerwarm beer + fridge = too long!warm beer + CO2 fire extinguisher = Perfect![quote]Protect the easly offended ... BAN EVERYTHING[/quote]^^ hmm works for me :D

Share this post


Link to post
Share on other sites

If there are only 170K of words it's no where near complete.

EDIT: Out of curiosity where did you find your word list (link)? Every time I find a new list I just merge it into mine. Sometimes I actually find a few words that are missing from my list.

i went with this as it was longer than the one on your webpage, but combining them might add afew i have no idea.

and yes 170k is far from complete but its a nice start for fun.

http://homepage.ntlworld.com/adam.bozon/Dictionary.htm

seb


GC - Program to rapidly manipulate DNA SequencesRotaMol - Program to measure Protein Size

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Damn! I thought I had updated that file.

It should be just over 188K words and should not include any single character words since it was written to update my wifes Scrabble program.

I'll update it in a few minutes.

EDIT: It's been updated.

I also have a copy of the list that does include single character words if you need it.

I know what happened with that origional list too; It only included words that my script was able to verify on a couple of dictionary sites.

I knew that when I was working on the script I was over 188K words because I remember mentioning that figure to SmOke_N at the time in an IM.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

You can also check this site for a word list that might be of use. I looked at the SCOWL list and there's over 290,000 lines in the list, but there's a lot of words that are plural versions, or possessives of other words in the list so it might take some culling to get a good list out of it.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

It will take a lot of culling since it also contains things like Acer which is a tree genus and would not normally be included in a word list.

Thanks for the link though. I'll run through them with a different script and see if I can add any to my list.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

@BrewManNH

Thanks to a link I found via that page you linked to. My new list is 255,329 words. I'll be uploading it later today.

Thanks


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

I think the Oxford English Dictionary only has that many words in it.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Probably correct but with thousands or even tens of thousands of wordlists available it takes time to generate a new one. Every once in a while I come across another list that is worth checking and this time it was thanks to you.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

cheers geo, ill try with the updated word list youve got,

thanks all for help

Seb


GC - Program to rapidly manipulate DNA SequencesRotaMol - Program to measure Protein Size

Share this post


Link to post
Share on other sites

It seems like your compiled list of words is quite massive and this information may no longer assist you, but check out the spell check dictionaries and language packs provided by the Mozilla foundation. The en-US dictionary consists of around 62,000 words that you may find don't exist in your list.

Also, what are you using to compare and import text so duplicates are not brought in?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0