CygnusX1

Are there English words in a text file?

6 posts in this topic

Hey, 

I'm a QA Engineer. We localize our product to 13 languages. Before I invest a lot of time in an idea I have I need an answer. 

My idea is to invoke dialogs, inspect menus, inspect tooltips, look at the status bar text, basically get all the text of our UI. Either write all the text to a text file or inspect the text on the fly. Is there a way to see if the string is in English?

In Ruby or Python I can add a gem or library to a script and do the check against an English dictionary.  I want to do the same with AutoIt. Can you #include an English dictionary to an AutoIt script to inspect the text and verify whether it is English or not?

Thank you.


Cygnus

Share this post


Link to post
Share on other sites



if your purpose is to determine if a given text is in English or not, then you can get pretty close if you check for non-English characters.

as a preparatory step, get yourself familiar with what valid characters are considered English, a.k.a "Basic Latin" (disregard punctuation): unicode 0020-007F.

now, if your text contains anything out of that range, it's probably not English.

when testing, at first you may discover some common characters out of that range, that are used in English too; adapt your code to compensate.

after few such tests and compensations, you ought to get pretty close to 100% certainty.

but if you don't mind your script taking forever to complete, you can download a full English dictionary to check your text against.

 

 

Share this post


Link to post
Share on other sites

Hey, 

Thanks for the responses! 

I'm currently pursuing this avenue, there is a website that can detect the language https://detectlanguage.com/.

It returns a JSON object like this

{"data":{"detections":[{"language":"sv","isReliable":false,"confidence":0.01}]}}

You pass in your string and API key

http://ws.detectlanguage.com/0.2/detect?q=buenos+dias+señor&key=demo

It's pretty sweet.

Thanks again.

Brad 


Cygnus

Share this post


Link to post
Share on other sites

A way, using  detectlanguage.com

ConsoleWrite( _GetLanguage("buenos dias señor" ) )


Func _GetLanguage($sText)
    Local $sRet, $aLang
    Local $sUrl = "https://ws.detectlanguage.com/0.2/detect"
    Local $sRequest = "key=demo&q=" & $sText

    Local $oHTTP = ObjCreate("Microsoft.XMLHTTP")
    If @error Then Return SetError(1, 0, "")
    $oHTTP.open ("POST", $sUrl,false)
    $oHTTP.setRequestHeader("Content-Type", "application/x-www-form-urlencoded")
    $oHTTP.setRequestHeader("Content-Length", StringLen($sRequest) )
    $oHTTP.send($sRequest)
    $sRet = $oHTTP.responseText
    $aLang = StringRegExp($sRet, '"language":"(\w+)"', 1)
    If @error Then Return SetError(2, 0, "")

    Return $aLang[0]
EndFunc

 

1 person likes this

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Hello. días requires accent mark.

 

Saludos

Edited by Danyfirex

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now