Package types (2.7.0)

API documentation for speech_v1.types package.

Classes

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

RecognitionMetadata

Description of audio data to be recognized. .. attribute:: interaction_type

The use case most closely describing the audio content to be recognized.

:type: google.cloud.speech_v1.types.RecognitionMetadata.InteractionType

RecognizeRequest

The top-level message sent by the client for the Recognize method.

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

SpeakerDiarizationConfig

Config to enable speaker diarization. .. attribute:: enable_speaker_diarization

If 'true', enables speaker detection for each recognized word in the top alternative of the recognition result using a speaker_tag provided in the WordInfo.

:type: bool

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list). .. attribute:: transcript

Transcript text representing the words that the user spoke.

:type: str

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

WordInfo

Word-specific information for recognized words. .. attribute:: start_time

Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

:type: google.protobuf.duration_pb2.Duration