StreamingRecognizeResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)
StreamingRecognizeResponse
is the only message returned to the
client by StreamingRecognize
. A series of zero or more
StreamingRecognizeResponse
messages are streamed back to the
client. If there is no recognizable audio then no messages are
streamed back to the client.
Here are some examples of StreamingRecognizeResponse
\ s that
might be returned while processing audio:
results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }
Notes:
Only two of the above responses #4 and #7 contain final results; they are indicated by
is_final: true
. Concatenating these together generates the full transcript: "to be or not to be that is the question".The others contain interim
results
. #3 and #6 contain two interimresults
: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults
.The specific
stability
andconfidence
values shown above are only for illustrative purposes. Actual values may vary.In each response, only one of these fields will be set:
error
,speech_event_type
, or one or more (repeated)results
.
Attributes
Name | Description |
results |
MutableSequence[google.cloud.speech_v2.types.StreamingRecognitionResult]
This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one is_final= true
result (the newly settled portion), followed by zero or more
is_final=false
results (the interim results).
|
speech_event_type |
google.cloud.speech_v2.types.StreamingRecognizeResponse.SpeechEventType
Indicates the type of speech event. |
speech_event_offset |
google.protobuf.duration_pb2.Duration
Time offset between the beginning of the audio and event emission. |
metadata |
google.cloud.speech_v2.types.RecognitionResponseMetadata
Metadata about the recognition. |
Classes
SpeechEventType
SpeechEventType(value)
Indicates the type of speech event.
Values:
SPEECH_EVENT_TYPE_UNSPECIFIED (0):
No speech event specified.
END_OF_SINGLE_UTTERANCE (1):
This event indicates that the server has detected the end of
the user's speech utterance and expects no additional
speech. Therefore, the server will not process additional
audio and will close the gRPC bidirectional stream. This
event is only sent if there was a force cutoff due to
silence being detected early. This event is only available
through the latest_short
model.
SPEECH_ACTIVITY_BEGIN (2):
This event indicates that the server has detected the
beginning of human voice activity in the stream. This event
can be returned multiple times if speech starts and stops
repeatedly throughout the stream. This event is only sent if
voice_activity_events
is set to true.
SPEECH_ACTIVITY_END (3):
This event indicates that the server has detected the end of
human voice activity in the stream. This event can be
returned multiple times if speech starts and stops
repeatedly throughout the stream. This event is only sent if
voice_activity_events
is set to true.