Feature: Enhanced Speech Recognition
We are pleased to announce that our speech recognition system has been completely rebuilt to provide a better user experience. With this update, users can expect the following enhancements:
- Automatic speech detection, you no longer need to use push to talk to speak without digital humans.
- Enhanced accuracy in speech recognition, especially for short utterances. No decrease in accuracy even in high latency environments.
- Your microphone audio will not be transmitted from your device until you begin speaking, ensuring privacy.
- You have the option to mute/unmute your microphone for privacy or to simulate push-to-talk functionality in noisy environments.
- Our voice activity detection system has been trained to detect speech rather than noise (background noise, coughing, music, etc)
- You can interrupt the digital human, and the character will stop speaking. However, background noise alone will not interrupt the character; it requires an interim transcription result.
- Improved stability for a more reliable experience.
There are two ways to access this new features. Either through our Hosted Experience (drop in script to provide an instant user experience), or via our web SDK (NPM package).
Migration Guide
Hosted Experience will provide full UI support speech recognition mode:
- Buttons to mute/unmute the users microphone.
- Indicators of users microphone status (muted, listening, active speech, blocked)
- (coming soon: transcription of the users speech displayed on screen)
To change to speech recognition mode, via your uneeqInteractionsOptions configuration, set voiceInputMode to "SPEECH_RECOGNITION".
Example:
Method Changes
If you are using Uneeq methods to programatically control voice recording you need to be aware of the following changes:
uneeqStartRecording and uneeqStopRecording will have no action when using speech recognition mode as speech recognition is automatic without the use of push to talk. These methods are no longer required.
Message Changes
Previously, when using push to talk, you would receive messages RecordingStarted and RecordingStopped to indicate when push to talk was engaged/disengaged. Using speech recognition mode, you will no longer recieve these messages.
There are new messages that will be sent when using speech recognition mode:
UserStartedSpeaking: Voice activity detection has recognized that the user has started speaking.
UserStoppedSpeaking: Voice activity detection has recognized that the user has stopped speaking.
SpeechTranscription: A new interim or final transcription result is available. See here for details of the message contents.
Migration Guide
If you've built your own experience and UI using our NPM package you will need to set voiceInputMode to "SPEECH_RECOGNITION" from version 2.49.0 onwards.
voiceInputMode: VOICE_ACTIVITY will be merged with SPEECH_RECOGNITION in version 2.50.0. From this version onwards you will get the SPEECH_RECOGNITION experience when using VOICE_ACTIVITY as your voiceInputMode.
Example:
Method Changes
If you are using Uneeq methods to programatically control voice recording you need to be aware of the following changes:
uneeqStartRecording and uneeqStopRecording will have no action when using speech recognition mode as speech recognition is automatic without the use of push to talk. These methods are no longer required.
Message Changes
Previously, when using push to talk, you would receive messages RecordingStarted and RecordingStopped to indicate when push to talk was engaged/disengaged. Using speech recognition mode, you will no longer recieve these messages.
There are new messages that will be sent when using speech recognition mode:
UserStartedSpeaking: Voice activity detection has recognized that the user has started speaking.
UserStoppedSpeaking: Voice activity detection has recognized that the user has stopped speaking.
SpeechTranscription: A new interim or final transcription result is available. See here for details of the message contents.