Package types (2.28.0)

API documentation for speech_v2.types package.

Classes

AccessMetadata

The access metadata for a particular region. This can be applied if the org policy for the given project disallows a particular region.

AutoDetectDecodingConfig

Automatically detected decoding parameters. Supported for the following encodings:

WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.
WAV_MULAW: 8-bit companded mulaw samples in a WAV container.
WAV_ALAW: 8-bit companded alaw samples in a WAV container.
RFC4867_5_AMR: AMR frames with an rfc4867.5 header.
RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.
FLAC: FLAC frames in the "native FLAC" container format.
MP3: MPEG audio frames with optional (ignored) ID3 metadata.
OGG_OPUS: Opus audio frames in an Ogg container.
WEBM_OPUS: Opus audio frames in a WebM container.
MP4_AAC: AAC audio frames in an MP4 container.
M4A_AAC: AAC audio frames in an M4A container.
MOV_AAC: AAC audio frames in an MOV container.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for BatchRecognize.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchRecognizeMetadata

Operation metadata for BatchRecognize.

BatchRecognizeRequest

Request message for the BatchRecognize method.

BatchRecognizeResponse

Response message for BatchRecognize that is packaged into a longrunning Operation][google.longrunning.Operation].

BatchRecognizeResults

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

BatchRecognizeTranscriptionMetadata

Metadata about transcription for a single file (for example, progress percent).

CloudStorageResult

Final results written to Cloud Storage.

Config

Message representing the config for the Speech-to-Text API. This includes an optional KMS key <https://cloud.google.com/kms/docs/resource-hierarchy#keys>__ with which incoming data will be encrypted.

CreateCustomClassRequest

Request message for the CreateCustomClass method.

CreatePhraseSetRequest

Request message for the CreatePhraseSet method.

CreateRecognizerRequest

Request message for the CreateRecognizer method.

CustomClass

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

DeleteCustomClassRequest

Request message for the DeleteCustomClass method.

DeletePhraseSetRequest

Request message for the DeletePhraseSet method.

DeleteRecognizerRequest

Request message for the DeleteRecognizer method.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

GcsOutputConfig

Output configurations for Cloud Storage.

GetConfigRequest

Request message for the GetConfig method.

GetCustomClassRequest

Request message for the GetCustomClass method.

GetPhraseSetRequest

Request message for the GetPhraseSet method.

GetRecognizerRequest

Request message for the GetRecognizer method.

InlineOutputConfig

Output configurations for inline response.

InlineResult

Final results returned inline in the recognition response.

LanguageMetadata

The metadata about locales available in a given region. Currently this is just the models that are available for each locale

ListCustomClassesRequest

Request message for the ListCustomClasses method.

ListCustomClassesResponse

Response message for the ListCustomClasses method.

ListPhraseSetsRequest

Request message for the ListPhraseSets method.

ListPhraseSetsResponse

Response message for the ListPhraseSets method.

ListRecognizersRequest

Request message for the ListRecognizers method.

ListRecognizersResponse

Response message for the ListRecognizers method.

LocationsMetadata

Main metadata for the Locations API for STT V2. Currently this is just the metadata about locales, models, and features

ModelFeature

Representes a singular feature of a model. If the feature is recognizer, the release_state of the feature represents the release_state of the model

ModelFeatures

Represents the collection of features belonging to a model

ModelMetadata

The metadata about the models in a given region for a specific locale. Currently this is just the features of the model

NativeOutputFileFormatConfig

Output configurations for serialized BatchRecognizeResults protos.

OperationMetadata

Represents the metadata of a long-running operation.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

OutputFormatConfig

Configuration for the format of the results stored to output.

PhraseSet

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionFeatures

Available recognition features.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionResponseMetadata

Metadata about the recognition request and response.

RecognizeRequest

Request message for the Recognize method. Either content or uri must be supplied. Supplying both or neither returns INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognizeResponse

Response message for the Recognize method.

Recognizer

A Recognizer message. Stores recognition configuration and metadata.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

SrtOutputFileFormatConfig

Output configurations SubRip Text <https://www.matroska.org/technical/subtitles.html#srt-subtitles>__ formatted subtitle file.

StreamingRecognitionConfig

Provides configuration information for the StreamingRecognize request.

StreamingRecognitionFeatures

Available recognition features specific to streaming recognition requests.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

Request message for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent in one call.

If the Recognizer referenced by recognizer contains a fully specified request configuration then the stream may only contain messages with only audio set.

Otherwise the first message must contain a recognizer and a streaming_config message that together fully specify the request configuration and must not contain audio. All subsequent messages must only have audio set.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponse\ s that might be returned while processing audio:

results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".
The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.
The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.
In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.