Jump to content

SAPI (Windows Speech Recognition API) recognition help


Recommended Posts

Modified version of code here:

#include <File.au3>
#include <Misc.au3>

;Only allow one instance of the program to run.
If _Singleton("Voice Commands", 1) = 0 Then
    Exit
EndIf

Dim $spokenWords
Dim $voiceCommands = _FileListToArray(@WorkingDir & "\Voice Commands")
Dim $voiceCommandsCap = $voiceCommands[0]
Dim $voiceCommandName
Dim $splitCommand
Dim $splitRecognition
Dim $parameter
Dim $sendParameter
Dim $count
Dim $skipSearch
Dim $transcriptionMode

Global $h_Context = ObjCreate("SAPI.SpInProcRecoContext")
Global $h_Recognizer = $h_Context.Recognizer
Global $h_Grammar = $h_Context.CreateGrammar(1)
$h_Grammar.Dictationload
$h_Grammar.DictationSetState(1)

;Create a token for the default audio input device and set it
Global $h_Category = ObjCreate("SAPI.SpObjectTokenCategory")
$h_Category.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\")
Global $h_Token = ObjCreate("SAPI.SpObjectToken")
$h_Token.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\")
$h_Recognizer.AudioInput = $h_Token

Global $i_ObjInitialized = 0

Global $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_")
If @error Then
    ConsoleWrite("ObjEvent error: " & @error & @CRLF)
    $i_ObjInitialized = 0
Else
    ConsoleWrite("ObjEvent created Successfully!" & @CRLF)
    $i_ObjInitialized = 1
EndIf

While $i_ObjInitialized
    Sleep(5000)
    ;Allow the Audio In to finalize processing on the last 5 second capture
    $h_Context.Pause
    ;Resume audio in processing
    $h_Context.Resume
    ;Reset event function allocation (what is this? I think its garbage collection or something, needs clarification)
    $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_")
WEnd

Func SpRecEvent_Hypothesis($StreamNumber, $StreamPosition, $Result)
    ConsoleWrite("Hypothesis(): Hypothized text is: " & $Result.PhraseInfo.GetText & @CRLF)
EndFunc   ;==>SpRecEvent_Hypothesis

Func SpRecEvent_Recognition($StreamNumber, $StreamPosition, $RecognitionType, $Result)
    ConsoleWrite($RecognitionType & "||" & $Result.PhraseInfo.GetText & @CRLF)
    $spokenWords = $Result.PhraseInfo.GetText
    CheckCommands()
EndFunc   ;==>SpRecEvent_Recognition

Func SpRecEvent_SoundStart($StreamNumber, $StreamPosition)
    ConsoleWrite("Sound Started" & @CRLF)
EndFunc   ;==>SpRecEvent_SoundStart

Func SpRecEvent_SoundEnd($StreamNumber, $StreamPosition)
    ConsoleWrite("Sound Ended" & @CRLF)
EndFunc   ;==>SpRecEvent_SoundEnd

Func CheckCommands()
    ;=== Special Voice Commands ===

    ;-- Transcription Mode--
    If $spokenWords = "Transcription Mode" Then
        If $transcriptionMode = 0 Then
            $transcriptionMode = 1
            ConsoleWrite("Transcription Mode On" & @CRLF)
            $skipSearch = 1
        Else
            $transcriptionMode = 0
            ConsoleWrite("Transcription Mode Off" & @CRLF)
            $skipSearch = 1
        EndIf
    EndIf

    If $transcriptionMode = 1 Then
        If $spokenWords <> "Transcription Mode" Then
            Send($spokenWords)
            $skipSearch = 1
        EndIf
    Else
        $skipSearch = 0
    EndIf
    ;---------

    ;==============================

    If $skipSearch = 0 Then
        ;=== Voice Command Search ===
        ;%% in the file name denotes that whatever is said after the command, should be sent as a parameter
        $count = 1
        While $count <= $voiceCommandsCap
            ConsoleWrite("count: " & $count & @CRLF)
            ConsoleWrite($voiceCommands[$count] & @CRLF)
            If StringInStr($voiceCommands[$count], "%%") <> 0 Then
                ConsoleWrite("found %%" & @CRLF)
                $splitCommand = StringSplit($voiceCommands[$count], " %%")
                If $splitCommand[0] > 2 Then
                    $voiceCommandName = $splitCommand[1] & " " & $splitCommand[2]
                Else
                    $voiceCommandName = $splitCommand[1]
                EndIf
                $splitRecognition = StringReplace($spokenWords, $voiceCommandName & " ", "")
                ;$splitRecognition = StringSplit($spokenWords, $voiceCommandName)
                ConsoleWrite("spokenWords: " & $spokenWords & @CRLF)
                ;ConsoleWrite("split By: " & $voiceCommandName & @CRLF)
                ConsoleWrite("splitRecognition: " & $splitRecognition & @CRLF)
                ;$parameter = $splitRecognition[1]
                $parameter = $splitRecognition
                $sendParameter = 1
                ConsoleWrite("voiceCommandName: " & $voiceCommandName & @CRLF)
                ConsoleWrite("Parameter: " & $parameter & @CRLF)
            Else
                $splitCommand = StringSplit($voiceCommands[$count], ".au3")
                $voiceCommandName = $splitCommand[1]
                $sendParameter = 0
            EndIf
            $count = $count + 1
        WEnd

        ConsoleWrite("Checking Command to List" & @CRLF)
        If StringInStr($spokenWords, $voiceCommandName) <> 0 Then
            If $sendParameter = 1 Then
                Run(@WorkingDir & "\Voice Commands\" & $voiceCommandName & " %%.exe " & $parameter)
                ;Run("AutoIt3.exe " & $voiceCommandName &"%%.au3" & $parameter)
            Else
                ShellExecute(@WorkingDir & "\Voice Commands\" & $voiceCommandName & ".exe")
                ;Run("AutoIt3.exe " & $voiceCommandName &".au3")
            EndIf
        EndIf
        ;==============================
    EndIf
    $skipSearch = 0
EndFunc   ;==>CheckCommands

My issue is with the recognition itself.

It's too often that the engine does not recognize what I'm saying.

I've tried searching for the recognition information, training, etc... but I'm not finding what I need.

I think I can use last hypothesized entry to check the commands, but....

I'm not sure how to get only the last entry for the hypothesis.

If you look in the console as you speak, the hypothesis is constantly changing until you stop speaking. So I'm fairly certain I need to check for a pause in speech to use this method.

Is the voice recognition engine shared amongst all programs?

Is there a good application to train the voice recognition?

sidenote:

The code is terribly inefficient at the moment. It's a work in progress. After I get it to recognize my voice, then I'll work on making the actual commands better optimized.

Edited by Sori

If you need help with your stuff, feel free to get me on my Skype.

I often get bored and enjoy helping with projects.

Link to comment
Share on other sites

  • 1 year later...

I think you can use my UDF Utter.au3 it will help you in adding specific words that you want the computer recognize and  free recognition is also avaliable here is the link https://www.autoitscript.com/forum/topic/175719-utter-utilizing-more-of-sapi/ hope i helped you Sori if you need any more help you can ask me

No matter whatever the challenge maybe control on the outcome its on you its always have been.

MY UDF: Transpond UDF (Sent vriables to Programs) , Utter UDF (Speech Recognition)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...