Single utterance behavior

Speech-to-Text provides the latest_short model for recognizing speech that consists of single utterances. This may be useful for applications where users are issuing single voice commands as opposed to long-form monologue or dictation.

When a recognizer with the latest_short model is used for a recognition request, Speech-to-Text will stop performing recognition once it detects an utterance has finished. Speech-to-Text will return a speech activity event response with the type END_OF_SINGLE_UTTERANCE followed by the transcription results.

Single utterance and StreamingRecognize

In the case where a latest_short model Recognizer is selected for a StreamingRecognize request, Speech-to-Text will close the stream automatically after the utterance has ended.

With voice activity events

In the case where voice activity events have also been enabled for a StreamingRecognize request, Speech-to-Text will still return speech begin/end voice activity events. Voice activity timeouts for speech begin will still be applied. Voice activity timeouts for speech end will not be applied, since the stream will be closed as soon as the utterance ends.