Package Classes (2.27.0)

Summary of entries of Classes for speech.

Classes

AdaptationAsyncClient

Service that implements Google Cloud Speech Adaptation API.

AdaptationClient

Service that implements Google Cloud Speech Adaptation API.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetAsyncPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

SpeechAsyncClient

Service that implements Google Cloud Speech API.

SpeechClient

Service that implements Google Cloud Speech API.

CreateCustomClassRequest

Message sent by the client for the CreateCustomClass method.

CreatePhraseSetRequest

Message sent by the client for the CreatePhraseSet method.

CustomClass

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

ClassItem

An item of the class.

DeleteCustomClassRequest

Message sent by the client for the DeleteCustomClass method.

DeletePhraseSetRequest

Message sent by the client for the DeletePhraseSet method.

GetCustomClassRequest

Message sent by the client for the GetCustomClass method.

GetPhraseSetRequest

Message sent by the client for the GetPhraseSet method.

ListCustomClassesRequest

Message sent by the client for the ListCustomClasses method.

ListCustomClassesResponse

Message returned to the client by the ListCustomClasses method.

ListPhraseSetRequest

Message sent by the client for the ListPhraseSet method.

ListPhraseSetResponse

Message returned to the client by the ListPhraseSet method.

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

PhraseSet

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Phrase

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <https://cloud.google.com/speech-to-text/quotas#content>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint <https://cloud.google.com/speech-to-text/docs/endpoints>__ with matching us or eu location value.

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

AudioEncoding

The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.

RecognitionMetadata

Description of audio data to be recognized.

InteractionType

Use case categories that the audio recognition request can be described by.

MicrophoneDistance

Enumerates the types of capture settings describing an audio file.

OriginalMediaType

The original media the speech was recorded on.

RecordingDeviceType

The type of device the speech was recorded with.

RecognizeRequest

The top-level message sent by the client for the Recognize method.

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

SpeakerDiarizationConfig

Config to enable speaker diarization.

SpeechAdaptation

Speech adaptation configuration.

ABNFGrammar

SpeechAdaptationInfo

Information on speech adaptation use in results

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranscriptOutputConfig

Specifies an optional destination for the recognition results.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

UpdateCustomClassRequest

Message sent by the client for the UpdateCustomClass method.

UpdatePhraseSetRequest

Message sent by the client for the UpdatePhraseSet method.

WordInfo

Word-specific information for recognized words.

AdaptationAsyncClient

Service that implements Google Cloud Speech Adaptation API.

AdaptationClient

Service that implements Google Cloud Speech Adaptation API.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetAsyncPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

SpeechAsyncClient

Service that implements Google Cloud Speech API.

SpeechClient

Service that implements Google Cloud Speech API.

CreateCustomClassRequest

Message sent by the client for the CreateCustomClass method.

CreatePhraseSetRequest

Message sent by the client for the CreatePhraseSet method.

CustomClass

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

ClassItem

An item of the class.

DeleteCustomClassRequest

Message sent by the client for the DeleteCustomClass method.

DeletePhraseSetRequest

Message sent by the client for the DeletePhraseSet method.

GetCustomClassRequest

Message sent by the client for the GetCustomClass method.

GetPhraseSetRequest

Message sent by the client for the GetPhraseSet method.

ListCustomClassesRequest

Message sent by the client for the ListCustomClasses method.

ListCustomClassesResponse

Message returned to the client by the ListCustomClasses method.

ListPhraseSetRequest

Message sent by the client for the ListPhraseSet method.

ListPhraseSetResponse

Message returned to the client by the ListPhraseSet method.

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

PhraseSet

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Phrase

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <https://cloud.google.com/speech-to-text/quotas#content>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint <https://cloud.google.com/speech-to-text/docs/endpoints>__ with matching us or eu location value.

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

AudioEncoding

The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.

RecognitionMetadata

Description of audio data to be recognized.

InteractionType

Use case categories that the audio recognition request can be described by.

MicrophoneDistance

Enumerates the types of capture settings describing an audio file.

OriginalMediaType

The original media the speech was recorded on.

RecordingDeviceType

The type of device the speech was recorded with.

RecognizeRequest

The top-level message sent by the client for the Recognize method.

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

SpeakerDiarizationConfig

Config to enable speaker diarization.

SpeechAdaptation

Speech adaptation configuration.

ABNFGrammar

SpeechAdaptationInfo

Information on speech adaptation use in results

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranscriptOutputConfig

Specifies an optional destination for the recognition results.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

UpdateCustomClassRequest

Message sent by the client for the UpdateCustomClass method.

UpdatePhraseSetRequest

Message sent by the client for the UpdatePhraseSet method.

WordInfo

Word-specific information for recognized words.

SpeechAsyncClient

Enables speech transcription and resource management.

SpeechClient

Enables speech transcription and resource management.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetsAsyncPager

A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetsPager

A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListRecognizersAsyncPager

A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __aiter__ method to iterate through its recognizers field.

If there are more pages, the __aiter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListRecognizersPager

A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __iter__ method to iterate through its recognizers field.

If there are more pages, the __iter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

AccessMetadata

The access metadata for a particular region. This can be applied if the org policy for the given project disallows a particular region.

ConstraintType

Describes the different types of constraints that can be applied on a region.

AutoDetectDecodingConfig

Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the "native FLAC" container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

  • MP4_AAC: AAC audio frames in an MP4 container.

  • M4A_AAC: AAC audio frames in an M4A container.

  • MOV_AAC: AAC audio frames in an MOV container.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for BatchRecognize.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchRecognizeFileResult

Final results for a single file.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchRecognizeMetadata

Operation metadata for BatchRecognize.

TranscriptionMetadataEntry

The abstract base class for a message.

BatchRecognizeRequest

Request message for the BatchRecognize method.

ProcessingStrategy

Possible processing strategies for batch requests.

BatchRecognizeResponse

Response message for BatchRecognize that is packaged into a longrunning Operation][google.longrunning.Operation].

ResultsEntry

The abstract base class for a message.

BatchRecognizeResults

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

BatchRecognizeTranscriptionMetadata

Metadata about transcription for a single file (for example, progress percent).

CloudStorageResult

Final results written to Cloud Storage.

Config

Message representing the config for the Speech-to-Text API. This includes an optional KMS key <https://cloud.google.com/kms/docs/resource-hierarchy#keys>__ with which incoming data will be encrypted.

CreateCustomClassRequest

Request message for the CreateCustomClass method.

CreatePhraseSetRequest

Request message for the CreatePhraseSet method.

CreateRecognizerRequest

Request message for the CreateRecognizer method.

CustomClass

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

AnnotationsEntry

The abstract base class for a message.

ClassItem

An item of the class.

State

Set of states that define the lifecycle of a CustomClass.

DeleteCustomClassRequest

Request message for the DeleteCustomClass method.

DeletePhraseSetRequest

Request message for the DeletePhraseSet method.

DeleteRecognizerRequest

Request message for the DeleteRecognizer method.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

AudioEncoding

Supported audio data encodings.

GcsOutputConfig

Output configurations for Cloud Storage.

GetConfigRequest

Request message for the GetConfig method.

GetCustomClassRequest

Request message for the GetCustomClass method.

GetPhraseSetRequest

Request message for the GetPhraseSet method.

GetRecognizerRequest

Request message for the GetRecognizer method.

InlineOutputConfig

Output configurations for inline response.

InlineResult

Final results returned inline in the recognition response.

LanguageMetadata

The metadata about locales available in a given region. Currently this is just the models that are available for each locale

ModelsEntry

The abstract base class for a message.

ListCustomClassesRequest

Request message for the ListCustomClasses method.

ListCustomClassesResponse

Response message for the ListCustomClasses method.

ListPhraseSetsRequest

Request message for the ListPhraseSets method.

ListPhraseSetsResponse

Response message for the ListPhraseSets method.

ListRecognizersRequest

Request message for the ListRecognizers method.

ListRecognizersResponse

Response message for the ListRecognizers method.

LocationsMetadata

Main metadata for the Locations API for STT V2. Currently this is just the metadata about locales, models, and features

ModelFeature

Representes a singular feature of a model. If the feature is recognizer, the release_state of the feature represents the release_state of the model

ModelFeatures

Represents the collection of features belonging to a model

ModelMetadata

The metadata about the models in a given region for a specific locale. Currently this is just the features of the model

ModelFeaturesEntry

The abstract base class for a message.

NativeOutputFileFormatConfig

Output configurations for serialized BatchRecognizeResults protos.

OperationMetadata

Represents the metadata of a long-running operation.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

OutputFormatConfig

Configuration for the format of the results stored to output.

PhraseSet

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.

AnnotationsEntry

The abstract base class for a message.

Phrase

A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.

State

Set of states that define the lifecycle of a PhraseSet.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionFeatures

Available recognition features.

MultiChannelMode

Options for how to recognize multi-channel audio.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionResponseMetadata

Metadata about the recognition request and response.

RecognizeRequest

Request message for the Recognize method. Either content or uri must be supplied. Supplying both or neither returns INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognizeResponse

Response message for the Recognize method.

Recognizer

A Recognizer message. Stores recognition configuration and metadata.

AnnotationsEntry

The abstract base class for a message.

State

Set of states that define the lifecycle of a Recognizer.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

AdaptationPhraseSet

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

SrtOutputFileFormatConfig

Output configurations SubRip Text <https://www.matroska.org/technical/subtitles.html#srt-subtitles>__ formatted subtitle file.

StreamingRecognitionConfig

Provides configuration information for the StreamingRecognize request.

StreamingRecognitionFeatures

Available recognition features specific to streaming recognition requests.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

Request message for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent in one call.

If the Recognizer referenced by recognizer contains a fully specified request configuration then the stream may only contain messages with only audio set.

Otherwise the first message must contain a recognizer and a streaming_config message that together fully specify the request configuration and must not contain audio. All subsequent messages must only have audio set.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranslationConfig

Translation configuration. Use to translate the given audio into text for the desired language.

UndeleteCustomClassRequest

Request message for the UndeleteCustomClass method.

UndeletePhraseSetRequest

Request message for the UndeletePhraseSet method.

UndeleteRecognizerRequest

Request message for the UndeleteRecognizer method.

UpdateConfigRequest

Request message for the UpdateConfig method.

UpdateCustomClassRequest

Request message for the UpdateCustomClass method.

UpdatePhraseSetRequest

Request message for the UpdatePhraseSet method.

UpdateRecognizerRequest

Request message for the UpdateRecognizer method.

VttOutputFileFormatConfig

Output configurations for WebVTT <https://www.w3.org/TR/webvtt1/>__ formatted subtitle file.

WordInfo

Word-specific information for recognized words.

Modules

pagers

API documentation for speech_v1.services.adaptation.pagers module.

pagers

API documentation for speech_v1p1beta1.services.adaptation.pagers module.

pagers

API documentation for speech_v2.services.speech.pagers module.