If you don't have SAPI, you can download 5.1 here. Try the script before, it seems SAPI gets distributed with most Vista and 7 versions.
This listens to what you say into the mic, runs it through the windows speech recognition engine, and outputs the results to console. Results improve the more you train the system. I'm getting decent accuracy after about an hour of training. This uses a localized speech recognition engine, meaning that it's not subject to the Vista/7 built in Speech Recognition tool. This means you can add it into your app for your own commands, and it won't capture the basic Speech commands. There are ways of specifying grammars, utilizing the training tools, and other intricacies I haven't got to yet.
Thanks to ProgAndy, cyberZeroCool, seangriffin, and all the others who've done SAPI work, you guys have blazed the trails for some very fun stuff.
Global $h_Context = ObjCreate("SAPI.SpInProcRecoContext") Global $h_Recognizer = $h_Context.Recognizer Global $h_Grammar = $h_Context.CreateGrammar(1) $h_Grammar.Dictationload $h_Grammar.DictationSetState(1) ;Create a token for the default audio input device and set it Global $h_Category = ObjCreate("SAPI.SpObjectTokenCategory") $h_Category.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\") Global $h_Token = ObjCreate("SAPI.SpObjectToken") $h_Token.SetId("HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\") $h_Recognizer.AudioInput = $h_Token Global $i_ObjInitialized = 0 Global $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_") If @error Then ConsoleWrite("ObjEvent error: " & @error & @CRLF) $i_ObjInitialized = 0 Else ConsoleWrite("ObjEvent created Successfully!" & @CRLF) $i_ObjInitialized = 1 EndIf While $i_ObjInitialized Sleep(5000) ;Allow the Audio In to finalize processing on the last 5 second capture $h_Context.Pause ;Resume audio in processing $h_Context.Resume ;Reset event function allocation (what is this? I think its garbage collection or something, needs clarification) $h_ObjectEvents = ObjEvent($h_Context, "SpRecEvent_") WEnd Func SpRecEvent_Hypothesis($StreamNumber, $StreamPosition, $Result) ConsoleWrite("Hypothesis(): Hypothized text is: " & $Result.PhraseInfo.GetText & @CRLF) EndFunc ;==>SpRecEvent_Hypothesis Func SpRecEvent_Recognition($StreamNumber, $StreamPosition, $RecognitionType, $Result) ConsoleWrite($RecognitionType & "||" & $Result.PhraseInfo.GetText & @CRLF) EndFunc ;==>SpRecEvent_Recognition Func SpRecEvent_SoundStart($StreamNumber, $StreamPosition) ConsoleWrite("Sound Started" & @CRLF) EndFunc ;==>SpRecEvent_SoundStart Func SpRecEvent_SoundEnd($StreamNumber, $StreamPosition) ConsoleWrite("Sound Ended" & @CRLF) EndFunc ;==>SpRecEvent_SoundEnd
The SoundEnd event doesn't appear to work. Everything else functions as intended. In order to use this, you have to parse completed phrases from the input. Don't worry about the Sleep(5000), that doesn't interfere with the operation of the recognition. That's just there to separate the sound input into manageable chunks. It's not in UDF format, but should be very easy to adapt into your projects.
A hypothesis is the engine's best guess as to what is being said. A recognition is a finalized hypothesis. After a recognition, any new input will be hypothesized until discarded or recognized. You can talk for as long as you want and it will piece together what is said, until there is a full 1 second gap in the incoming audio.
Here is the SAPI recognition documentation if you want to modify this for your own purposes. Have fun!
Edited by JRowe, 01 June 2010 - 07:12 AM.







