Package Classes (2.26.0)

Summary of entries of Classes for speech.

Classes

AdaptationAsyncClient

Service that implements Google Cloud Speech Adaptation API.

AdaptationClient

Service that implements Google Cloud Speech Adaptation API.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetAsyncPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

SpeechAsyncClient

Service that implements Google Cloud Speech API.

SpeechClient

Service that implements Google Cloud Speech API.

CreateCustomClassRequest

Message sent by the client for the CreateCustomClass method.

CreatePhraseSetRequest

Message sent by the client for the CreatePhraseSet method.

CustomClass

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

ClassItem

An item of the class.

DeleteCustomClassRequest

Message sent by the client for the DeleteCustomClass method.

DeletePhraseSetRequest

Message sent by the client for the DeletePhraseSet method.

GetCustomClassRequest

Message sent by the client for the GetCustomClass method.

GetPhraseSetRequest

Message sent by the client for the GetPhraseSet method.

ListCustomClassesRequest

Message sent by the client for the ListCustomClasses method.

ListCustomClassesResponse

Message returned to the client by the ListCustomClasses method.

ListPhraseSetRequest

Message sent by the client for the ListPhraseSet method.

ListPhraseSetResponse

Message returned to the client by the ListPhraseSet method.

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

PhraseSet

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Phrase

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <https://cloud.google.com/speech-to-text/quotas#content>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint <https://cloud.google.com/speech-to-text/docs/endpoints>__ with matching us or eu location value.

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

AudioEncoding

The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.

Values: ENCODING_UNSPECIFIED (0): Not specified. LINEAR16 (1): Uncompressed 16-bit signed little-endian samples (Linear PCM). FLAC (2): FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless--therefore recognition is not compromised--and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported. MULAW (3): 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. AMR (4): Adaptive Multi-Rate Narrowband codec. sample_rate_hertz must be 8000. AMR_WB (5): Adaptive Multi-Rate Wideband codec. sample_rate_hertz must be 16000. OGG_OPUS (6): Opus encoded audio frames in Ogg container (OggOpus <https://wiki.xiph.org/OggOpus>). sample_rate_hertz must be one of 8000, 12000, 16000, 24000, or 48000. SPEEX_WITH_HEADER_BYTE (7): Although the use of lossy encodings is not recommended, if a very low bitrate encoding is required, OGG_OPUS is highly preferred over Speex encoding. The Speex <https://speex.org/> encoding supported by Cloud Speech API has a header byte in each block, as in MIME type audio/x-speex-with-header-byte. It is a variant of the RTP Speex encoding defined in RFC 5574 <https://tools.ietf.org/html/rfc5574>__. The stream is a sequence of blocks, one block per RTP packet. Each block starts with a byte containing the length of the block, in bytes, followed by one or more frames of Speex data, padded to an integral number of bytes (octets) as specified in RFC

  1. In other words, each RTP header is replaced with a single byte containing the block length. Only Speex wideband is supported. sample_rate_hertz must be 16000. MP3 (8): MP3 audio. MP3 encoding is a Beta feature and only available in v1p1beta1. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sample_rate_hertz has to match the sample rate of the file being used. WEBM_OPUS (9): Opus encoded audio frames in WebM container (OggOpus <https://wiki.xiph.org/OggOpus>__). sample_rate_hertz must be one of 8000, 12000, 16000, 24000, or 48000.

RecognitionMetadata

Description of audio data to be recognized.

InteractionType

Use case categories that the audio recognition request can be described by.

Values: INTERACTION_TYPE_UNSPECIFIED (0): Use case is either unknown or is something other than one of the other values below. DISCUSSION (1): Multiple people in a conversation or discussion. For example in a meeting with two or more people actively participating. Typically all the primary people speaking would be in the same room (if not, see PHONE_CALL) PRESENTATION (2): One or more persons lecturing or presenting to others, mostly uninterrupted. PHONE_CALL (3): A phone-call or video-conference in which two or more people, who are not in the same room, are actively participating. VOICEMAIL (4): A recorded message intended for another person to listen to. PROFESSIONALLY_PRODUCED (5): Professionally produced audio (eg. TV Show, Podcast). VOICE_SEARCH (6): Transcribe spoken questions and queries into text. VOICE_COMMAND (7): Transcribe voice commands, such as for controlling a device. DICTATION (8): Transcribe speech to text to create a written document, such as a text-message, email or report.

MicrophoneDistance

Enumerates the types of capture settings describing an audio file.

Values: MICROPHONE_DISTANCE_UNSPECIFIED (0): Audio type is not known. NEARFIELD (1): The audio was captured from a closely placed microphone. Eg. phone, dictaphone, or handheld microphone. Generally if there speaker is within 1 meter of the microphone. MIDFIELD (2): The speaker if within 3 meters of the microphone. FARFIELD (3): The speaker is more than 3 meters away from the microphone.

OriginalMediaType

The original media the speech was recorded on.

Values: ORIGINAL_MEDIA_TYPE_UNSPECIFIED (0): Unknown original media type. AUDIO (1): The speech data is an audio recording. VIDEO (2): The speech data originally recorded on a video.

RecordingDeviceType

The type of device the speech was recorded with.

Values: RECORDING_DEVICE_TYPE_UNSPECIFIED (0): The recording device is unknown. SMARTPHONE (1): Speech was recorded on a smartphone. PC (2): Speech was recorded using a personal computer or tablet. PHONE_LINE (3): Speech was recorded over a phone line. VEHICLE (4): Speech was recorded in a vehicle. OTHER_OUTDOOR_DEVICE (5): Speech was recorded outdoors. OTHER_INDOOR_DEVICE (6): Speech was recorded indoors.

RecognizeRequest

The top-level message sent by the client for the Recognize method.

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

SpeakerDiarizationConfig

Config to enable speaker diarization.

SpeechAdaptation

Speech adaptation configuration.

ABNFGrammar

SpeechAdaptationInfo

Information on speech adaptation use in results

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

Values: SPEECH_EVENT_UNSPECIFIED (0): No speech event specified. END_OF_SINGLE_UTTERANCE (1): This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio (although it may subsequently return additional results). The client should stop sending additional audio data, half-close the gRPC connection, and wait for any additional results until the server closes the gRPC connection. This event is only sent if single_utterance was set to true, and is not used otherwise. SPEECH_ACTIVITY_BEGIN (2): This event indicates that the server has detected the beginning of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true. SPEECH_ACTIVITY_END (3): This event indicates that the server has detected the end of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true. SPEECH_ACTIVITY_TIMEOUT (4): This event indicates that the user-set timeout for speech activity begin or end has exceeded. Upon receiving this event, the client is expected to send a half close. Further audio will not be processed.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranscriptOutputConfig

Specifies an optional destination for the recognition results.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

UpdateCustomClassRequest

Message sent by the client for the UpdateCustomClass method.

UpdatePhraseSetRequest

Message sent by the client for the UpdatePhraseSet method.

WordInfo

Word-specific information for recognized words.

AdaptationAsyncClient

Service that implements Google Cloud Speech Adaptation API.

AdaptationClient

Service that implements Google Cloud Speech Adaptation API.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetAsyncPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetPager

A pager for iterating through list_phrase_set requests.

This class thinly wraps an initial ListPhraseSetResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSet requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

SpeechAsyncClient

Service that implements Google Cloud Speech API.

SpeechClient

Service that implements Google Cloud Speech API.

CreateCustomClassRequest

Message sent by the client for the CreateCustomClass method.

CreatePhraseSetRequest

Message sent by the client for the CreatePhraseSet method.

CustomClass

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

ClassItem

An item of the class.

DeleteCustomClassRequest

Message sent by the client for the DeleteCustomClass method.

DeletePhraseSetRequest

Message sent by the client for the DeletePhraseSet method.

GetCustomClassRequest

Message sent by the client for the GetCustomClass method.

GetPhraseSetRequest

Message sent by the client for the GetPhraseSet method.

ListCustomClassesRequest

Message sent by the client for the ListCustomClasses method.

ListCustomClassesResponse

Message returned to the client by the ListCustomClasses method.

ListPhraseSetRequest

Message sent by the client for the ListPhraseSet method.

ListPhraseSetResponse

Message returned to the client by the ListPhraseSet method.

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

PhraseSet

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Phrase

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits <https://cloud.google.com/speech-to-text/quotas#content>__.

List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in $\ {...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}).

Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint <https://cloud.google.com/speech-to-text/docs/endpoints>__ with matching us or eu location value.

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

AudioEncoding

The encoding of the audio data sent in the request.

All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS.

The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error code.

Values: ENCODING_UNSPECIFIED (0): Not specified. LINEAR16 (1): Uncompressed 16-bit signed little-endian samples (Linear PCM). FLAC (2): FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless--therefore recognition is not compromised--and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported. MULAW (3): 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. AMR (4): Adaptive Multi-Rate Narrowband codec. sample_rate_hertz must be 8000. AMR_WB (5): Adaptive Multi-Rate Wideband codec. sample_rate_hertz must be 16000. OGG_OPUS (6): Opus encoded audio frames in Ogg container (OggOpus <https://wiki.xiph.org/OggOpus>). sample_rate_hertz must be one of 8000, 12000, 16000, 24000, or 48000. SPEEX_WITH_HEADER_BYTE (7): Although the use of lossy encodings is not recommended, if a very low bitrate encoding is required, OGG_OPUS is highly preferred over Speex encoding. The Speex <https://speex.org/> encoding supported by Cloud Speech API has a header byte in each block, as in MIME type audio/x-speex-with-header-byte. It is a variant of the RTP Speex encoding defined in RFC 5574 <https://tools.ietf.org/html/rfc5574>__. The stream is a sequence of blocks, one block per RTP packet. Each block starts with a byte containing the length of the block, in bytes, followed by one or more frames of Speex data, padded to an integral number of bytes (octets) as specified in RFC

  1. In other words, each RTP header is replaced with a single byte containing the block length. Only Speex wideband is supported. sample_rate_hertz must be 16000. MP3 (8): MP3 audio. MP3 encoding is a Beta feature and only available in v1p1beta1. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sample_rate_hertz has to match the sample rate of the file being used. WEBM_OPUS (9): Opus encoded audio frames in WebM container (OggOpus <https://wiki.xiph.org/OggOpus>__). sample_rate_hertz must be one of 8000, 12000, 16000, 24000, or 48000.

RecognitionMetadata

Description of audio data to be recognized.

InteractionType

Use case categories that the audio recognition request can be described by.

Values: INTERACTION_TYPE_UNSPECIFIED (0): Use case is either unknown or is something other than one of the other values below. DISCUSSION (1): Multiple people in a conversation or discussion. For example in a meeting with two or more people actively participating. Typically all the primary people speaking would be in the same room (if not, see PHONE_CALL) PRESENTATION (2): One or more persons lecturing or presenting to others, mostly uninterrupted. PHONE_CALL (3): A phone-call or video-conference in which two or more people, who are not in the same room, are actively participating. VOICEMAIL (4): A recorded message intended for another person to listen to. PROFESSIONALLY_PRODUCED (5): Professionally produced audio (eg. TV Show, Podcast). VOICE_SEARCH (6): Transcribe spoken questions and queries into text. VOICE_COMMAND (7): Transcribe voice commands, such as for controlling a device. DICTATION (8): Transcribe speech to text to create a written document, such as a text-message, email or report.

MicrophoneDistance

Enumerates the types of capture settings describing an audio file.

Values: MICROPHONE_DISTANCE_UNSPECIFIED (0): Audio type is not known. NEARFIELD (1): The audio was captured from a closely placed microphone. Eg. phone, dictaphone, or handheld microphone. Generally if there speaker is within 1 meter of the microphone. MIDFIELD (2): The speaker if within 3 meters of the microphone. FARFIELD (3): The speaker is more than 3 meters away from the microphone.

OriginalMediaType

The original media the speech was recorded on.

Values: ORIGINAL_MEDIA_TYPE_UNSPECIFIED (0): Unknown original media type. AUDIO (1): The speech data is an audio recording. VIDEO (2): The speech data originally recorded on a video.

RecordingDeviceType

The type of device the speech was recorded with.

Values: RECORDING_DEVICE_TYPE_UNSPECIFIED (0): The recording device is unknown. SMARTPHONE (1): Speech was recorded on a smartphone. PC (2): Speech was recorded using a personal computer or tablet. PHONE_LINE (3): Speech was recorded over a phone line. VEHICLE (4): Speech was recorded in a vehicle. OTHER_OUTDOOR_DEVICE (5): Speech was recorded outdoors. OTHER_INDOOR_DEVICE (6): Speech was recorded indoors.

RecognizeRequest

The top-level message sent by the client for the Recognize method.

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

SpeakerDiarizationConfig

Config to enable speaker diarization.

SpeechAdaptation

Speech adaptation configuration.

ABNFGrammar

SpeechAdaptationInfo

Information on speech adaptation use in results

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client.

Here's an example of a series of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

Values: SPEECH_EVENT_UNSPECIFIED (0): No speech event specified. END_OF_SINGLE_UTTERANCE (1): This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio (although it may subsequently return additional results). The client should stop sending additional audio data, half-close the gRPC connection, and wait for any additional results until the server closes the gRPC connection. This event is only sent if single_utterance was set to true, and is not used otherwise. SPEECH_ACTIVITY_BEGIN (2): This event indicates that the server has detected the beginning of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true. SPEECH_ACTIVITY_END (3): This event indicates that the server has detected the end of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true. SPEECH_ACTIVITY_TIMEOUT (4): This event indicates that the user-set timeout for speech activity begin or end has exceeded. Upon receiving this event, the client is expected to send a half close. Further audio will not be processed.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranscriptOutputConfig

Specifies an optional destination for the recognition results.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

UpdateCustomClassRequest

Message sent by the client for the UpdateCustomClass method.

UpdatePhraseSetRequest

Message sent by the client for the UpdatePhraseSet method.

WordInfo

Word-specific information for recognized words.

SpeechAsyncClient

Enables speech transcription and resource management.

SpeechClient

Enables speech transcription and resource management.

ListCustomClassesAsyncPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __aiter__ method to iterate through its custom_classes field.

If there are more pages, the __aiter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListCustomClassesPager

A pager for iterating through list_custom_classes requests.

This class thinly wraps an initial ListCustomClassesResponse object, and provides an __iter__ method to iterate through its custom_classes field.

If there are more pages, the __iter__ method will make additional ListCustomClasses requests and continue to iterate through the custom_classes field on the corresponding responses.

All the usual ListCustomClassesResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetsAsyncPager

A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __aiter__ method to iterate through its phrase_sets field.

If there are more pages, the __aiter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListPhraseSetsPager

A pager for iterating through list_phrase_sets requests.

This class thinly wraps an initial ListPhraseSetsResponse object, and provides an __iter__ method to iterate through its phrase_sets field.

If there are more pages, the __iter__ method will make additional ListPhraseSets requests and continue to iterate through the phrase_sets field on the corresponding responses.

All the usual ListPhraseSetsResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListRecognizersAsyncPager

A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __aiter__ method to iterate through its recognizers field.

If there are more pages, the __aiter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

ListRecognizersPager

A pager for iterating through list_recognizers requests.

This class thinly wraps an initial ListRecognizersResponse object, and provides an __iter__ method to iterate through its recognizers field.

If there are more pages, the __iter__ method will make additional ListRecognizers requests and continue to iterate through the recognizers field on the corresponding responses.

All the usual ListRecognizersResponse attributes are available on the pager. If multiple requests are made, only the most recent response is retained, and thus used for attribute lookup.

AutoDetectDecodingConfig

Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the "native FLAC" container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

  • M4A: M4A audio format.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for BatchRecognize.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchRecognizeFileResult

Final results for a single file.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

BatchRecognizeMetadata

Operation metadata for BatchRecognize.

TranscriptionMetadataEntry

The abstract base class for a message.

BatchRecognizeRequest

Request message for the BatchRecognize method.

ProcessingStrategy

Possible processing strategies for batch requests.

Values: PROCESSING_STRATEGY_UNSPECIFIED (0): Default value for the processing strategy. The request is processed as soon as its received. DYNAMIC_BATCHING (1): If selected, processes the request during lower utilization periods for a price discount. The request is fulfilled within 24 hours.

BatchRecognizeResponse

Response message for BatchRecognize that is packaged into a longrunning Operation][google.longrunning.Operation].

ResultsEntry

The abstract base class for a message.

BatchRecognizeResults

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

BatchRecognizeTranscriptionMetadata

Metadata about transcription for a single file (for example, progress percent).

CloudStorageResult

Final results written to Cloud Storage.

Config

Message representing the config for the Speech-to-Text API. This includes an optional KMS key <https://cloud.google.com/kms/docs/resource-hierarchy#keys>__ with which incoming data will be encrypted.

CreateCustomClassRequest

Request message for the CreateCustomClass method.

CreatePhraseSetRequest

Request message for the CreatePhraseSet method.

CreateRecognizerRequest

Request message for the CreateRecognizer method.

CustomClass

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

AnnotationsEntry

The abstract base class for a message.

ClassItem

An item of the class.

State

Set of states that define the lifecycle of a CustomClass.

Values: STATE_UNSPECIFIED (0): Unspecified state. This is only used/useful for distinguishing unset values. ACTIVE (2): The normal and active state. DELETED (4): This CustomClass has been deleted.

DeleteCustomClassRequest

Request message for the DeleteCustomClass method.

DeletePhraseSetRequest

Request message for the DeletePhraseSet method.

DeleteRecognizerRequest

Request message for the DeleteRecognizer method.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

AudioEncoding

Supported audio data encodings.

Values: AUDIO_ENCODING_UNSPECIFIED (0): Default value. This value is unused. LINEAR16 (1): Headerless 16-bit signed little-endian PCM samples. MULAW (2): Headerless 8-bit companded mulaw samples. ALAW (3): Headerless 8-bit companded alaw samples.

GcsOutputConfig

Output configurations for Cloud Storage.

GetConfigRequest

Request message for the GetConfig method.

GetCustomClassRequest

Request message for the GetCustomClass method.

GetPhraseSetRequest

Request message for the GetPhraseSet method.

GetRecognizerRequest

Request message for the GetRecognizer method.

InlineOutputConfig

Output configurations for inline response.

InlineResult

Final results returned inline in the recognition response.

ListCustomClassesRequest

Request message for the ListCustomClasses method.

ListCustomClassesResponse

Response message for the ListCustomClasses method.

ListPhraseSetsRequest

Request message for the ListPhraseSets method.

ListPhraseSetsResponse

Response message for the ListPhraseSets method.

ListRecognizersRequest

Request message for the ListRecognizers method.

ListRecognizersResponse

Response message for the ListRecognizers method.

NativeOutputFileFormatConfig

Output configurations for serialized BatchRecognizeResults protos.

OperationMetadata

Represents the metadata of a long-running operation.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

OutputFormatConfig

Configuration for the format of the results stored to output.

PhraseSet

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.

AnnotationsEntry

The abstract base class for a message.

Phrase

A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.

State

Set of states that define the lifecycle of a PhraseSet.

Values: STATE_UNSPECIFIED (0): Unspecified state. This is only used/useful for distinguishing unset values. ACTIVE (2): The normal and active state. DELETED (4): This PhraseSet has been deleted.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionFeatures

Available recognition features.

MultiChannelMode

Options for how to recognize multi-channel audio.

Values: MULTI_CHANNEL_MODE_UNSPECIFIED (0): Default value for the multi-channel mode. If the audio contains multiple channels, only the first channel will be transcribed; other channels will be ignored. SEPARATE_RECOGNITION_PER_CHANNEL (1): If selected, each channel in the provided audio is transcribed independently. This cannot be selected if the selected model is latest_short.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognitionResponseMetadata

Metadata about the recognition request and response.

RecognizeRequest

Request message for the Recognize method. Either content or uri must be supplied. Supplying both or neither returns INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits <https://cloud.google.com/speech-to-text/quotas#content>__.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

RecognizeResponse

Response message for the Recognize method.

Recognizer

A Recognizer message. Stores recognition configuration and metadata.

AnnotationsEntry

The abstract base class for a message.

State

Set of states that define the lifecycle of a Recognizer.

Values: STATE_UNSPECIFIED (0): The default value. This value is used if the state is omitted. ACTIVE (2): The Recognizer is active and ready for use. DELETED (4): This Recognizer has been deleted.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

AdaptationPhraseSet

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

SrtOutputFileFormatConfig

Output configurations SubRip Text <https://www.matroska.org/technical/subtitles.html#srt-subtitles>__ formatted subtitle file.

StreamingRecognitionConfig

Provides configuration information for the StreamingRecognize request.

StreamingRecognitionFeatures

Available recognition features specific to streaming recognition requests.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

Request message for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent in one call.

If the Recognizer referenced by recognizer contains a fully specified request configuration then the stream may only contain messages with only audio set.

Otherwise the first message must contain a recognizer and a streaming_config message that together fully specify the request configuration and must not contain audio. All subsequent messages must only have audio set.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponse\ s that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

SpeechEventType

Indicates the type of speech event.

Values: SPEECH_EVENT_TYPE_UNSPECIFIED (0): No speech event specified. END_OF_SINGLE_UTTERANCE (1): This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio and will close the gRPC bidirectional stream. This event is only sent if there was a force cutoff due to silence being detected early. This event is only available through the latest_short model. SPEECH_ACTIVITY_BEGIN (2): This event indicates that the server has detected the beginning of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true. SPEECH_ACTIVITY_END (3): This event indicates that the server has detected the end of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Entry

A single replacement configuration.

TranslationConfig

Translation configuration. Use to translate the given audio into text for the desired language.

UndeleteCustomClassRequest

Request message for the UndeleteCustomClass method.

UndeletePhraseSetRequest

Request message for the UndeletePhraseSet method.

UndeleteRecognizerRequest

Request message for the UndeleteRecognizer method.

UpdateConfigRequest

Request message for the UpdateConfig method.

UpdateCustomClassRequest

Request message for the UpdateCustomClass method.

UpdatePhraseSetRequest

Request message for the UpdatePhraseSet method.

UpdateRecognizerRequest

Request message for the UpdateRecognizer method.

VttOutputFileFormatConfig

Output configurations for WebVTT <https://www.w3.org/TR/webvtt1/>__ formatted subtitle file.

WordInfo

Word-specific information for recognized words.

Modules

pagers

API documentation for speech_v1.services.adaptation.pagers module.

pagers

API documentation for speech_v1p1beta1.services.adaptation.pagers module.

pagers

API documentation for speech_v2.services.speech.pagers module.