Jump to content

Are there English words in a text file?


Recommended Posts

Hey, 

I'm a QA Engineer. We localize our product to 13 languages. Before I invest a lot of time in an idea I have I need an answer. 

My idea is to invoke dialogs, inspect menus, inspect tooltips, look at the status bar text, basically get all the text of our UI. Either write all the text to a text file or inspect the text on the fly. Is there a way to see if the string is in English?

In Ruby or Python I can add a gem or library to a script and do the check against an English dictionary.  I want to do the same with AutoIt. Can you #include an English dictionary to an AutoIt script to inspect the text and verify whether it is English or not?

Thank you.

Cygnus

Link to comment
Share on other sites

if your purpose is to determine if a given text is in English or not, then you can get pretty close if you check for non-English characters.

as a preparatory step, get yourself familiar with what valid characters are considered English, a.k.a "Basic Latin" (disregard punctuation): unicode 0020-007F.

now, if your text contains anything out of that range, it's probably not English.

when testing, at first you may discover some common characters out of that range, that are used in English too; adapt your code to compensate.

after few such tests and compensations, you ought to get pretty close to 100% certainty.

but if you don't mind your script taking forever to complete, you can download a full English dictionary to check your text against.

 

 

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

Hey, 

Thanks for the responses! 

I'm currently pursuing this avenue, there is a website that can detect the language https://detectlanguage.com/.

It returns a JSON object like this

{"data":{"detections":[{"language":"sv","isReliable":false,"confidence":0.01}]}}

You pass in your string and API key

http://ws.detectlanguage.com/0.2/detect?q=buenos+dias+señor&key=demo

It's pretty sweet.

Thanks again.

Brad 

Cygnus

Link to comment
Share on other sites

A way, using  detectlanguage.com

ConsoleWrite( _GetLanguage("buenos dias señor" ) )


Func _GetLanguage($sText)
    Local $sRet, $aLang
    Local $sUrl = "https://ws.detectlanguage.com/0.2/detect"
    Local $sRequest = "key=demo&q=" & $sText

    Local $oHTTP = ObjCreate("Microsoft.XMLHTTP")
    If @error Then Return SetError(1, 0, "")
    $oHTTP.open ("POST", $sUrl,false)
    $oHTTP.setRequestHeader("Content-Type", "application/x-www-form-urlencoded")
    $oHTTP.setRequestHeader("Content-Length", StringLen($sRequest) )
    $oHTTP.send($sRequest)
    $sRet = $oHTTP.responseText
    $aLang = StringRegExp($sRet, '"language":"(\w+)"', 1)
    If @error Then Return SetError(2, 0, "")

    Return $aLang[0]
EndFunc

 

Link to comment
Share on other sites

Hello. días requires accent mark.

 

Saludos

Edited by Danyfirex
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...