Sign in to follow this  
Followers 0
Albertxu

How to sort all words by alphabet in a word list?

34 posts in this topic

Hi, I have a word list which is not in alphabetic order and I need to sort them out in order. How can I do it with AutoIt? And also, how can I delete those duplicated entries which begin with capital letters?

Thank you again for your kind help.

Share this post


Link to post
Share on other sites



Load the list into an array, use _ArraySort to sort it, then use _ArrayUnique to return the list with no duplicates.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Load the list into an array, use _ArraySort to sort it, then use _ArrayUnique to return the list with no duplicates.

Nice tip on the _ArrayUnique. I didn't know about that one.


#include <ByteMe.au3>

Share this post


Link to post
Share on other sites

Nice tip on the _ArrayUnique. I didn't know about that one.

The help file description for that function isn't exactly accurate as written by the way.

It will not return the unique elements of just a 1D array, what it will return is a 1D array containing the elements of a 1 or 2D array, and removes all duplicate entries. So if you feed it an array that contains 1,3,4,2,2,4,5,5,6 the returned array will contain 6,1,3,4,2,5,6 where the first entry is the number of elements returned.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Let me digress to the untold hard part of the topic.

What do you call "alphabetical ordering"? There is no such thing in a general sense.

_ArraySort and similar approach only sort lexicographically a large part of Unicode (but not all of it), which is unsuitable in almost all kown languages.

Edit: that's utter bullshit. _ArraySort is case insensitive and uses your current locale (as does =).

What completely ignores your locale is the binary-wise string comparison operator ==

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I was not aware of that, seems to work fine for English.

#include <Array.au3>

$sring = "fur suede conker ball knuckle zebra far abc ade cool carl collected calm"

$astring = StringSplit($sring," ")

_ArrayDisplay($astring)

_ArraySort($astring)

_ArrayDisplay($astring)

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Try mixing with irish, chinese, farsi, ... strings.

Worse: it depends on what "order" you need, even if you restrict yourself to a given language. For instance, the german collation is different for dictionaries and phonebooks.

Et caetera.

So in the english case, you're lucky that the ASCII 7-bit charset matches the natural order. That's a rare exception.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

I wonder though (not being qualified) if the word 'alphabetic' means anything in those languages other than English?

As far as I know its adapted from the Greek language, so one would expect it mean something there.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

There exist indeed languages with no alphabet, i.e. not writen as a succession of individual letters. Even if you set them apart (which means billions of individuals) alphabet still has a meaning in cyrillic, greek, estonian, etc.

Talking about the latter, did you know that in Estonian, 'y' sorts between 'i' and 'j'? Or that in German phonebook sort, 'oe' sorts as if it were a single letter between 'o' and 'p' since it's a common writing for 'ö' ?

There is no 'estonian y' codepoint in Unicode, just one 'y', so how are you going to collate it? In German, 'ss' also represents 'ß', ...

These examples are only easy cases of issues when one doesn't consider internationalization at large.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Norweigan has æ, ø and å...

Just to add to the list... :huh2:


[font="helvetica, arial, sans-serif"]Hobby graphics artist, using gimp.Automating pc stuff, using AutoIt.Listening to music, using Grooveshark.[/font]Scripts:[spoiler]Simple ScreenshotSaves you alot of trouble when taking a screenshot!Don't remember what happened with this, but aperantly the exe is all i got.If you don't want to run it, simply don't._IsRun UDFIt figures out if the script has ben ran before based on the info in a ini file.If you don't want to use exactly what i wrote, you can use it as inspiration.[/spoiler]

Share this post


Link to post
Share on other sites

Yeah, and some scandinavian languages collate 'å' differently: I seem to recall that swedish treat it as a diacritic-ed 'a' while Norvegian insist on having it sorted after 'z'. Note that I may have this wrong, but still.

My purpose in extending to this topic was to warn people that, about languages terms and conventions, what we take for granted is very often a nonsense for our neighbours, partners, friends, customers, ...


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Another point is that _ArraySort does not differentiate between some characters. For example: it is not case sensitive. While this does not normally cause problems with English (as JCHD pointed out), there are circumstances where it may become important. I thought this was worth a mention.

#include <array.au3>

Local $aArray[4] = ["B","a","b","c"]
_ArraySort($aArray) ; Places upper case 'B' at 2nd position.
_ArrayDisplay($aArray)
_ArraySwap($aArray[1] ,$aArray[3])
_ArraySort($aArray) ; Places lower case 'b' at 2nd position.
_ArrayDisplay($aArray)

Share this post


Link to post
Share on other sites

Fascinating indeed, I can hardly learn properly computer languages, so spoken same is out of the question with me.

The next logical question I would ask regarding this subject is whether windows sorts things correctly, meaning as one would expect in their native language.

For example when you sort by name in a windows explorer, a list of files in the English version it sorts as I would expect, and if thats the case in other languages then perhaps there is an API which could be used to fashion a UDF.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Something out there must be sorting these languages correctly or else creating a dictionary or phone book would be painstaking hand work these days, so maybe JohnOne has the right idea, find the API that does it natively and use that to sort.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

I found this MSDN article which explains the process windows uses for anyone that might be interested in further study. It's far above my pay grade though so I don't understand much of it.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

Anyone fancy trying this in some language other than English?

#include <Array.au3>

$string = "chicken cheese chocolate chalk abc"

$aArray = StringSplit($string," ", 3)

_ArrayDisplay($aArray)

For $i = 0 To UBound($aArray) -1
    $aArray[$i] = StringToBinary($aArray[$i])
Next

_ArraySort($aArray)

For $i = 0 To 4
    $aArray[$i] = BinaryToString($aArray[$i])
Next

_ArrayDisplay($aArray)

Probably stupic though.

Edited by JohnOne

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Load the list into an array, use _ArraySort to sort it, then use _ArrayUnique to return the list with no duplicates.

I have a funny feeling _ArraySort() will put then in order.

Of course that means putting list into an array.

Next part would probably loop through the array, removing those you dont want.

As a beginner, I just wonder how "load" a word list into array? I know tutorial is a good thing, but I couldn't see anything helpful there.

Thanks.

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

As a beginner, I just wonder how "load" a word list into array? I know tutorial is a good thing, but I couldn't see anything helpful there.

Thanks.

There are different methods. I suggest to begin with that you look at the helpfile example for _ArrayAdd. This shows the easiest method to start with. Don't worry about using the function _ArrayAdd. Just look at how the array is declared. To offer a brief explanation:

First an array containing 10 elements is declared using the keyword Local. Then each element is given a value (in this case a name). Notice that the first element is counted as element 0. Run the script and try making some alterations to it.

Edit

Another method would be to use _FileReadToArray. This may suit your purpose more readily, however I strongly advise that you familiarize yourself with the first method I mentioned also.

Edited by czardas

Share this post


Link to post
Share on other sites

How you load the array will depend on the source of the words you need to sort. If it's a file then you can use FileReadToArray(). I also have a StringReadToArray that I use instead of FileReadToArray() It lets me either pass a file or a string to the function.

If you are just getting bits and pieces of a file or web page then one of the better methods Is StringRegExp() which will return an array.

The best thing you can do right now is dig into the help file and start to learn about arrays. There are several examples given that show how to use the various Array functions.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0