API documentation for speech_v1.types
package.
Classes
LongRunningRecognizeMetadata
Describes the progress of a long-running LongRunningRecognize
call. It is included in the metadata
field of the Operation
returned by the GetOperation
call of the
google::longrunning::Operations
service.
LongRunningRecognizeRequest
The top-level message sent by the client for the
LongRunningRecognize
method.
LongRunningRecognizeResponse
The only message returned to the client by the
LongRunningRecognize
method. It contains the result as zero or
more sequential SpeechRecognitionResult
messages. It is included
in the result.response
field of the Operation
returned by
the GetOperation
call of the google::longrunning::Operations
service.
RecognitionAudio
Contains audio data in the encoding specified in the
RecognitionConfig
. Either content
or uri
must be
supplied. Supplying both or neither returns
google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]
.
See content
limits <https://cloud.google.com/speech-to-text/quotas#content>
__.
RecognitionConfig
Provides information to the recognizer that specifies how to process the request.
RecognitionMetadata
Description of audio data to be recognized. .. attribute:: interaction_type
The use case most closely describing the audio content to be recognized.
:type: google.cloud.speech_v1.types.RecognitionMetadata.InteractionType
RecognizeRequest
The top-level message sent by the client for the Recognize
method.
RecognizeResponse
The only message returned to the client by the Recognize
method.
It contains the result as zero or more sequential
SpeechRecognitionResult
messages.
SpeakerDiarizationConfig
Config to enable speaker diarization. .. attribute:: enable_speaker_diarization
If 'true', enables speaker detection for each recognized word in the top alternative of the recognition result using a speaker_tag provided in the WordInfo.
:type: bool
SpeechContext
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list). .. attribute:: transcript
Transcript text representing the words that the user spoke.
:type: str
SpeechRecognitionResult
A speech recognition result corresponding to a portion of the audio.
StreamingRecognitionConfig
Provides information to the recognizer that specifies how to process the request.
StreamingRecognitionResult
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
StreamingRecognizeRequest
The top-level message sent by the client for the
StreamingRecognize
method. Multiple
StreamingRecognizeRequest
messages are sent. The first message
must contain a streaming_config
message and must not contain
audio_content
. All subsequent messages must contain
audio_content
and must not contain a streaming_config
message.
StreamingRecognizeResponse
StreamingRecognizeResponse
is the only message returned to the
client by StreamingRecognize
. A series of zero or more
StreamingRecognizeResponse
messages are streamed back to the
client. If there is no recognizable audio, and single_utterance
is set to false, then no messages are streamed back to the client.
Here's an example of a series of StreamingRecognizeResponse
\ s
that might be returned while processing audio:
results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }
Notes:
Only two of the above responses #4 and #7 contain final results; they are indicated by
is_final: true
. Concatenating these together generates the full transcript: "to be or not to be that is the question".The others contain interim
results
. #3 and #6 contain two interimresults
: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults
.The specific
stability
andconfidence
values shown above are only for illustrative purposes. Actual values may vary.In each response, only one of these fields will be set:
error
,speech_event_type
, or one or more (repeated)results
.
TranscriptOutputConfig
Specifies an optional destination for the recognition results.
WordInfo
Word-specific information for recognized words. .. attribute:: start_time
Time offset relative to the beginning of the audio, and
corresponding to the start of the spoken word. This field is
only set if enable_word_time_offsets=true
and only in
the top hypothesis. This is an experimental feature and the
accuracy of the time offset can vary.
:type: google.protobuf.duration_pb2.Duration