API documentation for speech_v1.types
package.
Classes
CustomClass
A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.
LongRunningRecognizeMetadata
Describes the progress of a long-running LongRunningRecognize
call. It is included in the metadata
field of the Operation
returned by the GetOperation
call of the
google::longrunning::Operations
service.
LongRunningRecognizeRequest
The top-level message sent by the client for the
LongRunningRecognize
method.
LongRunningRecognizeResponse
The only message returned to the client by the
LongRunningRecognize
method. It contains the result as zero or
more sequential SpeechRecognitionResult
messages. It is included
in the result.response
field of the Operation
returned by
the GetOperation
call of the google::longrunning::Operations
service.
PhraseSet
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
RecognitionAudio
Contains audio data in the encoding specified in the
RecognitionConfig
. Either content
or uri
must be
supplied. Supplying both or neither returns
google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]
.
See content
limits <https://cloud.google.com/speech-to-text/quotas#content>
__.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
RecognitionConfig
Provides information to the recognizer that specifies how to process the request.
RecognitionMetadata
Description of audio data to be recognized.
RecognizeRequest
The top-level message sent by the client for the Recognize
method.
RecognizeResponse
The only message returned to the client by the Recognize
method.
It contains the result as zero or more sequential
SpeechRecognitionResult
messages.
SpeakerDiarizationConfig
Config to enable speaker diarization.
SpeechAdaptation
Speech adaptation configuration.
SpeechContext
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
SpeechRecognitionResult
A speech recognition result corresponding to a portion of the audio.
StreamingRecognitionConfig
Provides information to the recognizer that specifies how to process the request.
StreamingRecognitionResult
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
StreamingRecognizeRequest
The top-level message sent by the client for the
StreamingRecognize
method. Multiple
StreamingRecognizeRequest
messages are sent. The first message
must contain a streaming_config
message and must not contain
audio_content
. All subsequent messages must contain
audio_content
and must not contain a streaming_config
message.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
StreamingRecognizeResponse
StreamingRecognizeResponse
is the only message returned to the
client by StreamingRecognize
. A series of zero or more
StreamingRecognizeResponse
messages are streamed back to the
client. If there is no recognizable audio, and single_utterance
is set to false, then no messages are streamed back to the client.
Here's an example of a series of StreamingRecognizeResponse
\ s
that might be returned while processing audio:
results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }
Notes:
Only two of the above responses #4 and #7 contain final results; they are indicated by
is_final: true
. Concatenating these together generates the full transcript: "to be or not to be that is the question".The others contain interim
results
. #3 and #6 contain two interimresults
: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults
.The specific
stability
andconfidence
values shown above are only for illustrative purposes. Actual values may vary.In each response, only one of these fields will be set:
error
,speech_event_type
, or one or more (repeated)results
.
TranscriptOutputConfig
Specifies an optional destination for the recognition results.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
WordInfo
Word-specific information for recognized words.