Package Classes (2.29.0)

Summary of entries of Classes for speech.



Service that implements Google Cloud Speech Adaptation API.


Service that implements Google Cloud Speech Adaptation API.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


Service that implements Google Cloud Speech API.


Service that implements Google Cloud Speech API.


Message sent by the client for the CreateCustomClass method.


Message sent by the client for the CreatePhraseSet method.


A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.


An item of the class.


Message sent by the client for the DeleteCustomClass method.


Message sent by the client for the DeletePhraseSet method.


Message sent by the client for the GetCustomClass method.


Message sent by the client for the GetPhraseSet method.


Message sent by the client for the ListCustomClasses method.


Message returned to the client by the ListCustomClasses method.


Message sent by the client for the ListPhraseSet method.


Message returned to the client by the ListPhraseSet method.


Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.


The top-level message sent by the client for the LongRunningRecognize method.


The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.


Provides "hints" to the speech recognizer to favor specific words and phrases in the results.


A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the endpoint, use the global location. To specify a region, use a regional endpoint <>__ with matching us or eu location value.


Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Provides information to the recognizer that specifies how to process the request.


The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.


Description of audio data to be recognized.


Use case categories that the audio recognition request can be described by.


Enumerates the types of capture settings describing an audio file.


The original media the speech was recorded on.


The type of device the speech was recorded with.


The top-level message sent by the client for the Recognize method.


The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.


Config to enable speaker diarization.


Speech adaptation configuration.



Information on speech adaptation use in results


Provides "hints" to the speech recognizer to favor specific words and phrases in the results.


Alternative hypotheses (a.k.a. n-best list).


A speech recognition result corresponding to a portion of the audio.


Provides information to the recognizer that specifies how to process the request.


Events that a timeout can be set on for voice activity.


A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.


The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }


  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.


Indicates the type of speech event.


Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.


A single replacement configuration.


Specifies an optional destination for the recognition results.

.. _oneof:


Message sent by the client for the UpdateCustomClass method.


Message sent by the client for the UpdatePhraseSet method.


Word-specific information for recognized words.


Service that implements Google Cloud Speech Adaptation API.


Service that implements Google Cloud Speech Adaptation API.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


Service that implements Google Cloud Speech API.


Service that implements Google Cloud Speech API.


Message sent by the client for the CreateCustomClass method.


Message sent by the client for the CreatePhraseSet method.


A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.


An item of the class.


Message sent by the client for the DeleteCustomClass method.


Message sent by the client for the DeletePhraseSet method.


Message sent by the client for the GetCustomClass method.


Message sent by the client for the GetPhraseSet method.


Message sent by the client for the ListCustomClasses method.


Message returned to the client by the ListCustomClasses method.


Message sent by the client for the ListPhraseSet method.


Message returned to the client by the ListPhraseSet method.


Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.


The top-level message sent by the client for the LongRunningRecognize method.


The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.


Provides "hints" to the speech recognizer to favor specific words and phrases in the results.


A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the endpoint, use the global location. To specify a region, use a regional endpoint <>__ with matching us or eu location value.


Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Provides information to the recognizer that specifies how to process the request.


The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.


Description of audio data to be recognized.


Use case categories that the audio recognition request can be described by.


Enumerates the types of capture settings describing an audio file.


The original media the speech was recorded on.


The type of device the speech was recorded with.


The top-level message sent by the client for the Recognize method.


The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.


Config to enable speaker diarization.


Speech adaptation configuration.



Information on speech adaptation use in results


Provides "hints" to the speech recognizer to favor specific words and phrases in the results.


Alternative hypotheses (a.k.a. n-best list).


A speech recognition result corresponding to a portion of the audio.


Provides information to the recognizer that specifies how to process the request.


Events that a timeout can be set on for voice activity.


A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.


The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }


  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.


Indicates the type of speech event.


Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.


A single replacement configuration.


Specifies an optional destination for the recognition results.

.. _oneof:


Message sent by the client for the UpdateCustomClass method.


Message sent by the client for the UpdatePhraseSet method.


Word-specific information for recognized words.


Enables speech transcription and resource management.


Enables speech transcription and resource management.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __aiter__ method to iterate through its recognizers field.

If there are more pages, the __aiter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __iter__ method to iterate through its recognizers field.

If there are more pages, the __iter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.


The access metadata for a particular region. This can be applied if the org policy for the given project disallows a particular region.


Describes the different types of constraints that can be applied on a region.


Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the "native FLAC" container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

  • MP4_AAC: AAC audio frames in an MP4 container.

  • M4A_AAC: AAC audio frames in an M4A container.

  • MOV_AAC: AAC audio frames in an MOV container.


Metadata about a single file in a batch for BatchRecognize.

.. _oneof:


Final results for a single file.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Operation metadata for BatchRecognize.


The abstract base class for a message.


Request message for the BatchRecognize method.


Possible processing strategies for batch requests.


Response message for BatchRecognize that is packaged into a longrunning Operation][google.longrunning.Operation].


The abstract base class for a message.


Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.


Metadata about transcription for a single file (for example, progress percent).


Final results written to Cloud Storage.


Message representing the config for the Speech-to-Text API. This includes an optional KMS key <>__ with which incoming data will be encrypted.


Request message for the CreateCustomClass method.


Request message for the CreatePhraseSet method.


Request message for the CreateRecognizer method.


CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.


The abstract base class for a message.


An item of the class.


Set of states that define the lifecycle of a CustomClass.


Request message for the DeleteCustomClass method.


Request message for the DeletePhraseSet method.


Request message for the DeleteRecognizer method.


Explicitly specified decoding parameters.


Supported audio data encodings.


Output configurations for Cloud Storage.


Request message for the GetConfig method.


Request message for the GetCustomClass method.


Request message for the GetPhraseSet method.


Request message for the GetRecognizer method.


Output configurations for inline response.


Final results returned inline in the recognition response.


The metadata about locales available in a given region. Currently this is just the models that are available for each locale


The abstract base class for a message.


Request message for the ListCustomClasses method.


Response message for the ListCustomClasses method.


Request message for the ListPhraseSets method.


Response message for the ListPhraseSets method.


Request message for the ListRecognizers method.


Response message for the ListRecognizers method.


Main metadata for the Locations API for STT V2. Currently this is just the metadata about locales, models, and features


Representes a singular feature of a model. If the feature is recognizer, the release_state of the feature represents the release_state of the model


Represents the collection of features belonging to a model


The metadata about the models in a given region for a specific locale. Currently this is just the features of the model


The abstract base class for a message.


Output configurations for serialized BatchRecognizeResults protos.


Represents the metadata of a long-running operation.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Configuration for the format of the results stored to output.


PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.


The abstract base class for a message.


A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.


Set of states that define the lifecycle of a PhraseSet.


Provides information to the Recognizer that specifies how to process the recognition request.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Available recognition features.


Options for how to recognize multi-channel audio.


Configuration options for the output(s) of recognition.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Metadata about the recognition request and response.


Request message for the Recognize method. Either content or uri must be supplied. Supplying both or neither returns INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Response message for the Recognize method.


A Recognizer message. Stores recognition configuration and metadata.


The abstract base class for a message.


Set of states that define the lifecycle of a Recognizer.


Configuration to enable speaker diarization.


Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.


A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


Alternative hypotheses (a.k.a. n-best list).


A speech recognition result corresponding to a portion of the audio.


Output configurations SubRip Text <>__ formatted subtitle file.


Provides configuration information for the StreamingRecognize request.


Available recognition features specific to streaming recognition requests.


Events that a timeout can be set on for voice activity.


A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.


Request message for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent in one call.

If the Recognizer referenced by recognizer contains a fully specified request configuration then the stream may only contain messages with only audio set.

Otherwise the first message must contain a recognizer and a streaming_config message that together fully specify the request configuration and must not contain audio. All subsequent messages must only have audio set.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof:


StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }


  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.


Indicates the type of speech event.


Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.


A single replacement configuration.


Translation configuration. Use to translate the given audio into text for the desired language.


Request message for the UndeleteCustomClass method.


Request message for the UndeletePhraseSet method.


Request message for the UndeleteRecognizer method.


Request message for the UpdateConfig method.


Request message for the UpdateCustomClass method.


Request message for the UpdatePhraseSet method.


Request message for the UpdateRecognizer method.


Output configurations for WebVTT <>__ formatted subtitle file.


Word-specific information for recognized words.



API documentation for module.


API documentation for module.


API documentation for module.