yton Posted May 19, 2011 Share Posted May 19, 2011 Greetings, I have a string featuring english and non-english (spanish, german etc.) words Is there any chance to fetch only english words and delete all the rest? Please, help. I am stuck. Thanks, Link to comment Share on other sites More sharing options...
jchd Posted May 19, 2011 Share Posted May 19, 2011 (edited) As soon as you come up with an unambiguous (I mean algorithmically unambiguous) definition of an english word vs. a non-english word, chime again and we should come up with a workable script. Edited May 19, 2011 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
yton Posted May 19, 2011 Author Share Posted May 19, 2011 english words contain english letters? perhaps i can decode somewhow to get various codes for english and non english alphabet? Link to comment Share on other sites More sharing options...
BrewManNH Posted May 19, 2011 Share Posted May 19, 2011 French/German/Italian/Spanish... words all contain "English" letters, that's not a very valid criterium. By the way, they're considered Latin letters, not "English" letters. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator Link to comment Share on other sites More sharing options...
dufran3 Posted May 19, 2011 Share Posted May 19, 2011 english words contain english letters?That is hilarious, so is the letter "a" english or non-english? lol Link to comment Share on other sites More sharing options...
yton Posted May 19, 2011 Author Share Posted May 19, 2011 okay, i found out that after turning string to array english and non-english rows go one by one Is there any chance to get even / uneven rows only? Link to comment Share on other sites More sharing options...
jchd Posted May 19, 2011 Share Posted May 19, 2011 Je produis un parfait exemple de phrase utilisant des mots non anglais et en employant uniquement des lettres latines sans diacritiques. Does this count for an valid sequence of englishwords? Hint: Google translate that from French into English! This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
jchd Posted May 19, 2011 Share Posted May 19, 2011 If your input is bilingual on a base-2 "basis", then the problem is entirely different. Use step 2 with the For loop processing your input. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
PowerCat Posted May 19, 2011 Share Posted May 19, 2011 (edited) Make a text file including every english word in a dictionary. http://wordlist.sourceforge.net/ http://www.mieliestronk.com/wordlist.html Load that text file into however many arrays you'll need to fit them. Then compare your word list with the dictionary arrays, and if you don't have any results, dump the word. Efficient? no. Will it work? I have no clue. Edited May 19, 2011 by PowerCat Link to comment Share on other sites More sharing options...
trancexx Posted May 19, 2011 Share Posted May 19, 2011 English benglish. Use American as normal people do. ♡♡♡ . eMyvnE Link to comment Share on other sites More sharing options...
yton Posted May 19, 2011 Author Share Posted May 19, 2011 ) thank you so much for you replies and help I found the solution for my issue Link to comment Share on other sites More sharing options...
jchd Posted May 19, 2011 Share Posted May 19, 2011 PowerCat, You seem to believe that the set of all possible words in a given language doesn't intersect with any other. Not only that doesn't hold water but that also ignores that in some countries more than one language is widely used (easy examples: Canada, Belgium, ...). This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
PowerCat Posted May 20, 2011 Share Posted May 20, 2011 (edited) Même si un mot en anglais est de souche francaise, il devrait quand même se retrouver dans un dictionnaire anglais.Je suis pas trop certain de comprendre ce que tu veux dire.Si un mot se retrouve pas dans un dictionnaire anglais, n'est il pas un mot dans une autre langue?PowerCat,You seem to believe that the set of all possible words in a given language doesn't intersect with any other.Not only that doesn't hold water but that also ignores that in some countries more than one language is widely used (easy examples: Canada, Belgium, ...). Edited May 20, 2011 by PowerCat Link to comment Share on other sites More sharing options...
jchd Posted May 20, 2011 Share Posted May 20, 2011 But when a word is found in both french and english dictionnaries, then what is it actually? Anyway, the OP found that the sentences were alternating, so the point is moot. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
GEOSoft Posted May 20, 2011 Share Posted May 20, 2011 (edited) #Include<array.au3> ;; For _ArrayDisplay purposes only $sFile = @ScriptDir & "\wordlist.txt" ;; file attached $sWordlist = FileRead($sFile) If @Error Then MsgBox(0, "Error", "Unable to read the word list.") Exit EndIf $sStr = "This is some string enthält sowohl englischen und deutschen wörtern et quelques mots français followed by a spellink mistake." $aStr = StringRegExp($sStr, "\S+", 3);; Change this to StringRegExp($sStr, "\S{2,", 3) to ignore single letter words like "a" If NOT @Error Then _ArrayDisplay($aStr, "Full String") $sValid = "" For $i = 0 To UBound($aStr) - 1 $aStr[$i] = StringRegExpReplace($aStr[$i], "[.!?,]", "");; Just in case we pick up punctuation If StringRegExp($sWordlist, "(?i)(?m:^)" & $aStr[$i] & "(?:\s|$)+", 0) Then $sValid &= $aStr[$i] & "|" Next $aStr = StringSplit(StringTrimRight($sValid, 1), "|", 2) _ArrayDisplay($aStr, "English Words") EndIfSomeone will complain about me not allowing for punctuation in the initial array but it was done for a reason. One of those Bindar Dundat© things.Edit:You could also read the wordlist file into an SQLite database and then query that for the result.EDIT 2The wordlist can be found here Edited June 17, 2011 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Shaarad Posted May 21, 2011 Share Posted May 21, 2011 Hey guys, can't we use the _StringExplode command to break the string into separate words, and find each word in the english words collection...if it is there, then save it separately into a temporary file and then show all the data collected in the temporary file as the output ? Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
GEOSoft Posted May 21, 2011 Share Posted May 21, 2011 Why, _StringExplode just returns an array of the contents of whatever string you send to it. That's easily done anyway. The trick is to get all the words into an array which can be acconplished with $aStr = StringRegExp($sStr, "\S+", 3) Then you compare each of those to the text file that contains all the English words and put then into a string. If StringRegExp($sWordlist, "(?i)(?m:^)" & $aStr[$i] & "(?:\s|$)+", 0) Then $sValid &= $aStr[$i] & "|" After that you use StringSplit to change that string into an array if you want to. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Shaarad Posted May 21, 2011 Share Posted May 21, 2011 Hey this is what I have made ! Check this out....I think this is what you were looking for ! Do reply about its usefulness and guys check for bugs and tell me please ! English Words Filter.rar Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
GEOSoft Posted May 21, 2011 Share Posted May 21, 2011 You still made it far more complicated that was required. Why would I want 26 files when I can do it in 1? _String Explode() is getting a little outdated there are several ways to skin the cat since that was written besides why include that whole file for the sake of 1 function? Speaking of outdated; the 1990s called and they want their file archiver back. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Rogue5099 Posted May 21, 2011 Share Posted May 21, 2011 (edited) Speaking of outdated; the 1990s called and they want their file archiver back.Agree. Compression of GeoSofts file wordlist.txt:Size on Disk Original (958,464 bytes):.rar = 266,240 bytes.zip = 245,760 bytes.7z = 200,704 bytes Even though 7-Zip compresses the most why use a third party compresstion utlity when Windows can compress .zip files already. Reason I have them is for people who use them and need to decompress. While 7-zip is my choice to handle all.Only thing I have found Winrar useful for is automattically running a program after self-extracting, kinda like a Installer. Edited May 21, 2011 by rogue5099 My projects: Inventory / Mp3 Inventory, Computer Stats Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now