Jump to content
Sori

SAPI (Windows Speech Recognition API) recognition help

Recommended Posts

Sori

Modified version of code here:

#include <File.au3>
#include <Misc.au3>

;Only allow one instance of the program to run.
If _Singleton("Voice Commands", 1) = 0 Then
    Exit
EndIf

Dim $spokenWords
Dim $voiceCommands = _FileListToArray(@WorkingDir & "\Voice Commands")
Dim $voiceCommandsCap = $voiceCommands[0]
Dim $voiceCommandName
Dim $splitCommand
Dim $splitRecognition
Dim $parameter
Dim $sendParameter
Dim $count
Dim $skipSearch
Dim $transcriptionMode

Global $h_Context = ObjCreate("SAPI.SpInProcRecoContext")
Global $h_Recognizer = $h_Context.Recognizer
Global $h_Grammar = $h_Context.CreateGrammar(1)
$h_Grammar.Dictationload
$h_Grammar.DictationSetState(1)

;Create a token for the default audio input device and set it
Global $h_Category = ObjCreate("SAPI.SpObjectTokenCategory")
$h_Category.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\")
Global $h_Token = ObjCreate("SAPI.SpObjectToken")
$h_Token.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\")
$h_Recognizer.AudioInput = $h_Token

Global $i_ObjInitialized = 0

Global $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_")
If @error Then
    ConsoleWrite("ObjEvent error: " & @error & @CRLF)
    $i_ObjInitialized = 0
Else
    ConsoleWrite("ObjEvent created Successfully!" & @CRLF)
    $i_ObjInitialized = 1
EndIf

While $i_ObjInitialized
    Sleep(5000)
    ;Allow the Audio In to finalize processing on the last 5 second capture
    $h_Context.Pause
    ;Resume audio in processing
    $h_Context.Resume
    ;Reset event function allocation (what is this? I think its garbage collection or something, needs clarification)
    $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_")
WEnd

Func SpRecEvent_Hypothesis($StreamNumber, $StreamPosition, $Result)
    ConsoleWrite("Hypothesis(): Hypothized text is: " & $Result.PhraseInfo.GetText & @CRLF)
EndFunc   ;==>SpRecEvent_Hypothesis

Func SpRecEvent_Recognition($StreamNumber, $StreamPosition, $RecognitionType, $Result)
    ConsoleWrite($RecognitionType & "||" & $Result.PhraseInfo.GetText & @CRLF)
    $spokenWords = $Result.PhraseInfo.GetText
    CheckCommands()
EndFunc   ;==>SpRecEvent_Recognition

Func SpRecEvent_SoundStart($StreamNumber, $StreamPosition)
    ConsoleWrite("Sound Started" & @CRLF)
EndFunc   ;==>SpRecEvent_SoundStart

Func SpRecEvent_SoundEnd($StreamNumber, $StreamPosition)
    ConsoleWrite("Sound Ended" & @CRLF)
EndFunc   ;==>SpRecEvent_SoundEnd

Func CheckCommands()
    ;=== Special Voice Commands ===

    ;-- Transcription Mode--
    If $spokenWords = "Transcription Mode" Then
        If $transcriptionMode = 0 Then
            $transcriptionMode = 1
            ConsoleWrite("Transcription Mode On" & @CRLF)
            $skipSearch = 1
        Else
            $transcriptionMode = 0
            ConsoleWrite("Transcription Mode Off" & @CRLF)
            $skipSearch = 1
        EndIf
    EndIf

    If $transcriptionMode = 1 Then
        If $spokenWords <> "Transcription Mode" Then
            Send($spokenWords)
            $skipSearch = 1
        EndIf
    Else
        $skipSearch = 0
    EndIf
    ;---------

    ;==============================

    If $skipSearch = 0 Then
        ;=== Voice Command Search ===
        ;%% in the file name denotes that whatever is said after the command, should be sent as a parameter
        $count = 1
        While $count <= $voiceCommandsCap
            ConsoleWrite("count: " & $count & @CRLF)
            ConsoleWrite($voiceCommands[$count] & @CRLF)
            If StringInStr($voiceCommands[$count], "%%") <> 0 Then
                ConsoleWrite("found %%" & @CRLF)
                $splitCommand = StringSplit($voiceCommands[$count], " %%")
                If $splitCommand[0] > 2 Then
                    $voiceCommandName = $splitCommand[1] & " " & $splitCommand[2]
                Else
                    $voiceCommandName = $splitCommand[1]
                EndIf
                $splitRecognition = StringReplace($spokenWords, $voiceCommandName & " ", "")
                ;$splitRecognition = StringSplit($spokenWords, $voiceCommandName)
                ConsoleWrite("spokenWords: " & $spokenWords & @CRLF)
                ;ConsoleWrite("split By: " & $voiceCommandName & @CRLF)
                ConsoleWrite("splitRecognition: " & $splitRecognition & @CRLF)
                ;$parameter = $splitRecognition[1]
                $parameter = $splitRecognition
                $sendParameter = 1
                ConsoleWrite("voiceCommandName: " & $voiceCommandName & @CRLF)
                ConsoleWrite("Parameter: " & $parameter & @CRLF)
            Else
                $splitCommand = StringSplit($voiceCommands[$count], ".au3")
                $voiceCommandName = $splitCommand[1]
                $sendParameter = 0
            EndIf
            $count = $count + 1
        WEnd

        ConsoleWrite("Checking Command to List" & @CRLF)
        If StringInStr($spokenWords, $voiceCommandName) <> 0 Then
            If $sendParameter = 1 Then
                Run(@WorkingDir & "\Voice Commands\" & $voiceCommandName & " %%.exe " & $parameter)
                ;Run("AutoIt3.exe " & $voiceCommandName &"%%.au3" & $parameter)
            Else
                ShellExecute(@WorkingDir & "\Voice Commands\" & $voiceCommandName & ".exe")
                ;Run("AutoIt3.exe " & $voiceCommandName &".au3")
            EndIf
        EndIf
        ;==============================
    EndIf
    $skipSearch = 0
EndFunc   ;==>CheckCommands

My issue is with the recognition itself.

It's too often that the engine does not recognize what I'm saying.

I've tried searching for the recognition information, training, etc... but I'm not finding what I need.

I think I can use last hypothesized entry to check the commands, but....

I'm not sure how to get only the last entry for the hypothesis.

If you look in the console as you speak, the hypothesis is constantly changing until you stop speaking. So I'm fairly certain I need to check for a pause in speech to use this method.

Is the voice recognition engine shared amongst all programs?

Is there a good application to train the voice recognition?

sidenote:

The code is terribly inefficient at the moment. It's a work in progress. After I get it to recognize my voice, then I'll work on making the actual commands better optimized.

Edited by Sori
  • Like 1

If you need help with your stuff, feel free to get me on my Skype.

I often get bored and enjoy helping with projects.

Share this post


Link to post
Share on other sites
Surya

I think you can use my UDF Utter.au3 it will help you in adding specific words that you want the computer recognize and  free recognition is also avaliable here is the link https://www.autoitscript.com/forum/topic/175719-utter-utilizing-more-of-sapi/ hope i helped you Sori if you need any more help you can ask me


No matter whatever the challenge maybe control on the outcome its on you its always have been.

MY UDF: Transpond UDF (Sent vriables to Programs) , Utter UDF (Speech Recognition)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • Fenzik
      By Fenzik
       Hello all"
      I have curious problem with com object implementation of Sapi 5.1.
      In some cases }Some Voice engines] the metods for retrieve the voice parameters fails with error :Member not exists:.
      But the Retrieved Voice object can speak the given text, so It exists and work.
      Example of this type of Engine can be this one: http://download.kobavision.be/KobaSpeech3/KobaSpeech 3 With Vocalizer Serena - English (Great Britain).exe (can work as demo)
      So my question is> Is there some way to workaround or solve this issue?
      What i tryed:
      1. Typical use of Sapi.spvoice object:
      $oMyError = ObjEvent("AutoIt.Error","MyErrFunc"); Install a custom error handler
       
        $spvoice = ObjCreate("sapi.spvoice")
      for $voice in $spvoice.getvoices()
        msgbox(0, "Voice", $voice.getdescription())
      next
      Func MyErrFunc()
      $HexNumber = hex($oMyError.number, 8)
      Msgbox(0,"","We intercepted a COM Error !" & @CRLF &"Number is: " & $HexNumber & @CRLF &"Windescription is: " & $oMyError.windescription)
      SetError(1)
      Endfunc

      2. Implement workaround based on Nvda Screen reader sapi5 Library at https://github.com/nvaccess/nvda/blob/master/source/synthDrivers/sapi5.py
      Thys code in Pascal should work, so i tryed to reproduce it in Autoit.
      Pascal code just as example:
                   SOTokens:=SpVoice.GetVoices('','');
                   for i:=0 to SOTokens.Count-1 do
                   try
                        SOToken:=SOTokens.Item(I); s:=SOToken.GetDescription(0);
      end
      In Autoit I tryed it like this:
      $oMyError = ObjEvent("AutoIt.Error","MyErrFunc"); Install a custom error handler
        $spvoice = ObjCreate("sapi.spvoice")
      for $i = 0 to $spvoice.getvoices.count-1
      $name = $spvoice.getvoices.item($i).getdescription
      msgbox(0,"Voice", $name)
      next
      Func MyErrFunc()
      $HexNumber = hex($oMyError.number, 8)
      Msgbox(0,"","We intercepted a COM Error !" & @CRLF &"Number is: " & $HexNumber & @CRLF &"Windescription is: " & $oMyError.windescription)
      SetError(1)
      Endfunc
      Both of this methods returning same Error ("Member not exists.").
      Thanks a lot for help.
      Znefyg
    • Surya
      By Surya
      Utter is a free ware windows API automation script.It can do most of the sapi dll functions."SAPI" stands for Windows Speech Reconition API,SAPI.dll is the file
      which manages the speech recognition of windows Utter utilises most of the SAPI functions making use of the best potential of SAPI.dll,You can include speech 
      recognition to your project by using utter.
      Utter zipped and updated (new version with examples)
      Modified ......: 12/04/2017
      Version .......: 3.0.0.1
      Author ........: Surya
      I am new to autoit it sounds great and i love it while i am getting used to it so i want to write my own UDF in autoit first of all i thank all the forum members because i couldnt do it without research,So i wrote UTTER ,Its is a UDF that uses most of the SAPI dll function or in simple words it can do many functions relating to the computers speech recognition  if you have any doubt in the code or have any bugs please notify me freely I will be always there to help its my first UDF so please notify me if you found any error Thank you! 
      Utter has been recently updated ,examples included.The zip can be downloaded here at the download section of autoit : Download utter
                                       !! CAUTION !!
      REMEMBER TO SHUTDOWN THE INSTANCE OF CREATED RECOGNITION 
      ENGINE BEFORE STARTING ANOTHER INSTANCE IF YOU START ANOTHER
      WITHOUT SHUTTING THE PREVIOUS ONE DOWN IT WILL LEAD TO AN ERROR!
      REMEMBER THAT "|" IS THE DEFAULT GUIDataSeparatorChar CHANGE IT 
      ACCORDING TO YOUR NEEDS AND GRAMMAR DELIMITER IS GUIDataSeparatorChar
      IF NO GUIDataSeparatorChar IS FOUND IN THE INPUT STRING THEN THE
      ENTIRE STRING WOULD BE CONSIDERED AS ONE WORD!
      DO NOT CALL THE INTERNAL FUNCTIONS THEY ARE TO BE CALLED INSIDE THE FUNCTION AND DO NOT 
      CHANGE THE VALUE OF VARIABLES USED IN THE FUNCTION!
      THE RECIEVING FUNCTIONS SHOULD HAVE ATLEAST ONE PARAMETER TO ACCEPT THE SPEECH COMMANDS
      FROM THE _Utter_Speech_GrammarRecognize() FUNCTION
       
      please report if you have any bugs/complaints
    • Imperial
      By Imperial
      How to make AutoIt Detect Sounds using the Microphone Input to hear you what you said
    • Surya
      By Surya
      Utter is simply a UDF created for the maximum utilization of SAPI (Speech Recognition API) in windows you can add your own words to be recognized by the computer you can set speed,picth and select the voice you want by speech synthesis included in windows.Utter can create a free grammar recognition engine as well as custom made grammar recognition engine suiting according to your need also it is flexible.The shutdown function of the UDF must be called before calling another one to destroy the current engine running when autoit closes the engine will also close many functionalities are included an update will be soon in future
×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.