Package google.cloud.speech.v2

Index

Speech

Enables speech transcription and resource management.

BatchRecognize

rpc BatchRecognize(BatchRecognizeRequest) returns (Operation)

Performs batch asynchronous speech recognition: send a request with N audio files and receive a long running operation that can be polled to see when the transcriptions are finished.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the recognizer resource:

  • speech.recognizers.recognize

For more information, see the IAM documentation.

CreateCustomClass

rpc CreateCustomClass(CreateCustomClassRequest) returns (Operation)

Creates a CustomClass.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.customClasses.create

For more information, see the IAM documentation.

CreatePhraseSet

rpc CreatePhraseSet(CreatePhraseSetRequest) returns (Operation)

Creates a PhraseSet.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.phraseSets.create

For more information, see the IAM documentation.

CreateRecognizer

rpc CreateRecognizer(CreateRecognizerRequest) returns (Operation)

Creates a Recognizer.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.recognizers.create

For more information, see the IAM documentation.

DeleteCustomClass

rpc DeleteCustomClass(DeleteCustomClassRequest) returns (Operation)

Deletes the CustomClass.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.customClasses.delete

For more information, see the IAM documentation.

DeletePhraseSet

rpc DeletePhraseSet(DeletePhraseSetRequest) returns (Operation)

Deletes the PhraseSet.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.phraseSets.delete

For more information, see the IAM documentation.

DeleteRecognizer

rpc DeleteRecognizer(DeleteRecognizerRequest) returns (Operation)

Deletes the Recognizer.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.recognizers.delete

For more information, see the IAM documentation.

GetConfig

rpc GetConfig(GetConfigRequest) returns (Config)

Returns the requested Config.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.config.get

For more information, see the IAM documentation.

GetCustomClass

rpc GetCustomClass(GetCustomClassRequest) returns (CustomClass)

Returns the requested CustomClass.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.customClasses.get

For more information, see the IAM documentation.

GetPhraseSet

rpc GetPhraseSet(GetPhraseSetRequest) returns (PhraseSet)

Returns the requested PhraseSet.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.phraseSets.get

For more information, see the IAM documentation.

GetRecognizer

rpc GetRecognizer(GetRecognizerRequest) returns (Recognizer)

Returns the requested Recognizer. Fails with NOT_FOUND if the requested Recognizer doesn't exist.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.recognizers.get

For more information, see the IAM documentation.

ListCustomClasses

rpc ListCustomClasses(ListCustomClassesRequest) returns (ListCustomClassesResponse)

Lists CustomClasses.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.customClasses.list

For more information, see the IAM documentation.

ListPhraseSets

rpc ListPhraseSets(ListPhraseSetsRequest) returns (ListPhraseSetsResponse)

Lists PhraseSets.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.phraseSets.list

For more information, see the IAM documentation.

ListRecognizers

rpc ListRecognizers(ListRecognizersRequest) returns (ListRecognizersResponse)

Lists Recognizers.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • speech.recognizers.list

For more information, see the IAM documentation.

Recognize

rpc Recognize(RecognizeRequest) returns (RecognizeResponse)

Performs synchronous Speech recognition: receive results after all audio has been sent and processed.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the recognizer resource:

  • speech.recognizers.recognize

For more information, see the IAM documentation.

StreamingRecognize

rpc StreamingRecognize(StreamingRecognizeRequest) returns (StreamingRecognizeResponse)

Performs bidirectional streaming speech recognition: receive results while sending audio. This method is only available via the gRPC API (not REST).

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the recognizer resource:

  • speech.recognizers.recognize

For more information, see the IAM documentation.

UndeleteCustomClass

rpc UndeleteCustomClass(UndeleteCustomClassRequest) returns (Operation)

Undeletes the CustomClass.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.customClasses.undelete

For more information, see the IAM documentation.

UndeletePhraseSet

rpc UndeletePhraseSet(UndeletePhraseSetRequest) returns (Operation)

Undeletes the PhraseSet.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.phraseSets.undelete

For more information, see the IAM documentation.

UndeleteRecognizer

rpc UndeleteRecognizer(UndeleteRecognizerRequest) returns (Operation)

Undeletes the Recognizer.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.recognizers.undelete

For more information, see the IAM documentation.

UpdateConfig

rpc UpdateConfig(UpdateConfigRequest) returns (Config)

Updates the Config.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.config.update

For more information, see the IAM documentation.

UpdateCustomClass

rpc UpdateCustomClass(UpdateCustomClassRequest) returns (Operation)

Updates the CustomClass.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.customClasses.update

For more information, see the IAM documentation.

UpdatePhraseSet

rpc UpdatePhraseSet(UpdatePhraseSetRequest) returns (Operation)

Updates the PhraseSet.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.phraseSets.update

For more information, see the IAM documentation.

UpdateRecognizer

rpc UpdateRecognizer(UpdateRecognizerRequest) returns (Operation)

Updates the Recognizer.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • speech.recognizers.update

For more information, see the IAM documentation.

AccessMetadata

The access metadata for a particular region. This can be applied if the org policy for the given project disallows a particular region.

Fields
constraint_type

ConstraintType

Describes the different types of constraints that are applied.

ConstraintType

Describes the different types of constraints that can be applied on a region.

Enums
CONSTRAINT_TYPE_UNSPECIFIED Unspecified constraint applied.
RESOURCE_LOCATIONS_ORG_POLICY_CREATE_CONSTRAINT The project's org policy disallows the given region.

AutoDetectDecodingConfig

This type has no fields.

Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the "native FLAC" container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

  • M4A: M4A audio format.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for BatchRecognize.

Fields
config

RecognitionConfig

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the config_mask field can be used to override parts of the default_recognition_config of the Recognizer resource as well as the config at the request level.

config_mask

FieldMask

The list of fields in config that override the values in the default_recognition_config of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in config override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

Union field audio_source. The audio source, which is a Google Cloud Storage URI. audio_source can be only one of the following:
uri

string

Cloud Storage URI for the audio file.

BatchRecognizeFileResult

Final results for a single file.

Fields
error

Status

Error if one was encountered.

metadata

RecognitionResponseMetadata

uri
(deprecated)

string

Deprecated. Use cloud_storage_result.native_format_uri instead.

transcript
(deprecated)

BatchRecognizeResults

Deprecated. Use inline_result.transcript instead.

Union field result.

result can be only one of the following:

cloud_storage_result

CloudStorageResult

Recognition results written to Cloud Storage. This is populated only when GcsOutputConfig is set in the [RecognitionOutputConfig][google.cloud.speech.v2.RecognitionOutputConfig.

inline_result

InlineResult

Recognition results. This is populated only when InlineOutputConfig is set in the [RecognitionOutputConfig][google.cloud.speech.v2.RecognitionOutputConfig.

BatchRecognizeMetadata

Operation metadata for BatchRecognize.

Fields
transcription_metadata

map<string, BatchRecognizeTranscriptionMetadata>

Map from provided filename to the transcription metadata for that file.

BatchRecognizeRequest

Request message for the BatchRecognize method.

Fields
recognizer

string

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

config

RecognitionConfig

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the config_mask field can be used to override parts of the default_recognition_config of the Recognizer resource.

config_mask

FieldMask

The list of fields in config that override the values in the default_recognition_config of the recognizer during this recognition request. If no mask is provided, all given fields in config override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

files[]

BatchRecognizeFileMetadata

Audio files with file metadata for ASR. The maximum number of files allowed to be specified is 5.

recognition_output_config

RecognitionOutputConfig

Configuration options for where to output the transcripts of each file.

processing_strategy

ProcessingStrategy

Processing strategy to use for this request.

ProcessingStrategy

Possible processing strategies for batch requests.

Enums
PROCESSING_STRATEGY_UNSPECIFIED Default value for the processing strategy. The request is processed as soon as its received.
DYNAMIC_BATCHING If selected, processes the request during lower utilization periods for a price discount. The request is fulfilled within 24 hours.

BatchRecognizeResponse

Response message for BatchRecognize that is packaged into a longrunning Operation.

Fields
results

map<string, BatchRecognizeFileResult>

Map from filename to the final result for that file.

total_billed_duration

Duration

When available, billed audio seconds for the corresponding request.

BatchRecognizeResults

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

Fields
results[]

SpeechRecognitionResult

Sequential list of transcription results corresponding to sequential portions of audio.

metadata

RecognitionResponseMetadata

Metadata about the recognition.

BatchRecognizeTranscriptionMetadata

Metadata about transcription for a single file (for example, progress percent).

Fields
progress_percent

int32

How much of the file has been transcribed so far.

error

Status

Error if one was encountered.

uri

string

The Cloud Storage URI to which recognition results will be written.

CloudStorageResult

Final results written to Cloud Storage.

Fields
uri

string

The Cloud Storage URI to which recognition results were written.

vtt_format_uri

string

The Cloud Storage URI to which recognition results were written as VTT formatted captions. This is populated only when VTT output is requested.

srt_format_uri

string

The Cloud Storage URI to which recognition results were written as SRT formatted captions. This is populated only when SRT output is requested.

Config

Message representing the config for the Speech-to-Text API. This includes an optional KMS key with which incoming data will be encrypted.

Fields
name

string

Output only. Identifier. The name of the config resource. There is exactly one config resource per project per location. The expected format is projects/{project}/locations/{location}/config.

kms_key_name

string

Optional. An optional KMS key name that if present, will be used to encrypt Speech-to-Text resources at-rest. Updating this key will not encrypt existing resources using this key; only new resources will be encrypted using this key. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

update_time

Timestamp

Output only. The most recent time this resource was modified.

CreateCustomClassRequest

Request message for the CreateCustomClass method.

Fields
custom_class

CustomClass

Required. The CustomClass to create.

validate_only

bool

If set, validate the request and preview the CustomClass, but do not actually create it.

custom_class_id

string

The ID to use for the CustomClass, which will become the final component of the CustomClass's resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

parent

string

Required. The project and location where this CustomClass will be created. The expected format is projects/{project}/locations/{location}.

CreatePhraseSetRequest

Request message for the CreatePhraseSet method.

Fields
phrase_set

PhraseSet

Required. The PhraseSet to create.

validate_only

bool

If set, validate the request and preview the PhraseSet, but do not actually create it.

phrase_set_id

string

The ID to use for the PhraseSet, which will become the final component of the PhraseSet's resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

parent

string

Required. The project and location where this PhraseSet will be created. The expected format is projects/{project}/locations/{location}.

CreateRecognizerRequest

Request message for the CreateRecognizer method.

Fields
recognizer

Recognizer

Required. The Recognizer to create.

validate_only

bool

If set, validate the request and preview the Recognizer, but do not actually create it.

recognizer_id

string

The ID to use for the Recognizer, which will become the final component of the Recognizer's resource name.

This value should be 4-63 characters, and valid characters are /[a-z][0-9]-/.

parent

string

Required. The project and location where this Recognizer will be created. The expected format is projects/{project}/locations/{location}.

CustomClass

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

Fields
name

string

Output only. Identifier. The resource name of the CustomClass. Format: projects/{project}/locations/{location}/customClasses/{custom_class}.

uid

string

Output only. System-assigned unique identifier for the CustomClass.

display_name

string

Optional. User-settable, human-readable name for the CustomClass. Must be 63 characters or less.

items[]

ClassItem

A collection of class items.

state

State

Output only. The CustomClass lifecycle state.

create_time

Timestamp

Output only. Creation time.

update_time

Timestamp

Output only. The most recent time this resource was modified.

delete_time

Timestamp

Output only. The time at which this resource was requested for deletion.

expire_time

Timestamp

Output only. The time at which this resource will be purged.

annotations

map<string, string>

Optional. Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

etag

string

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

reconciling

bool

Output only. Whether or not this CustomClass is in the process of being updated.

kms_key_name

string

Output only. The KMS key name with which the CustomClass is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

kms_key_version_name

string

Output only. The KMS key version name with which the CustomClass is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

ClassItem

An item of the class.

Fields
value

string

The class item's value.

State

Set of states that define the lifecycle of a CustomClass.

Enums
STATE_UNSPECIFIED Unspecified state. This is only used/useful for distinguishing unset values.
ACTIVE The normal and active state.
DELETED This CustomClass has been deleted.

DeleteCustomClassRequest

Request message for the DeleteCustomClass method.

Fields
name

string

Required. The name of the CustomClass to delete. Format: projects/{project}/locations/{location}/customClasses/{custom_class}

validate_only

bool

If set, validate the request and preview the deleted CustomClass, but do not actually delete it.

allow_missing

bool

If set to true, and the CustomClass is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

DeletePhraseSetRequest

Request message for the DeletePhraseSet method.

Fields
name

string

Required. The name of the PhraseSet to delete. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}

validate_only

bool

If set, validate the request and preview the deleted PhraseSet, but do not actually delete it.

allow_missing

bool

If set to true, and the PhraseSet is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

DeleteRecognizerRequest

Request message for the DeleteRecognizer method.

Fields
name

string

Required. The name of the Recognizer to delete. Format: projects/{project}/locations/{location}/recognizers/{recognizer}

validate_only

bool

If set, validate the request and preview the deleted Recognizer, but do not actually delete it.

allow_missing

bool

If set to true, and the Recognizer is not found, the request will succeed and be a no-op (no Operation is recorded in this case).

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

Fields
encoding

AudioEncoding

Required. Encoding of the audio data sent for recognition.

sample_rate_hertz

int32

Sample rate in Hertz of the audio data sent for recognition. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). Supported for the following encodings:

  • LINEAR16: Headerless 16-bit signed little-endian PCM samples.

  • MULAW: Headerless 8-bit companded mulaw samples.

  • ALAW: Headerless 8-bit companded alaw samples.

audio_channel_count

int32

Number of channels present in the audio data sent for recognition. Supported for the following encodings:

  • LINEAR16: Headerless 16-bit signed little-endian PCM samples.

  • MULAW: Headerless 8-bit companded mulaw samples.

  • ALAW: Headerless 8-bit companded alaw samples.

The maximum allowed value is 8.

AudioEncoding

Supported audio data encodings.

Enums
AUDIO_ENCODING_UNSPECIFIED Default value. This value is unused.
LINEAR16 Headerless 16-bit signed little-endian PCM samples.
MULAW Headerless 8-bit companded mulaw samples.
ALAW Headerless 8-bit companded alaw samples.

GcsOutputConfig

Output configurations for Cloud Storage.

Fields
uri

string

The Cloud Storage URI prefix with which recognition results will be written.

GetConfigRequest

Request message for the GetConfig method.

Fields
name

string

Required. The name of the config to retrieve. There is exactly one config resource per project per location. The expected format is projects/{project}/locations/{location}/config.

GetCustomClassRequest

Request message for the GetCustomClass method.

Fields
name

string

Required. The name of the CustomClass to retrieve. The expected format is projects/{project}/locations/{location}/customClasses/{custom_class}.

GetPhraseSetRequest

Request message for the GetPhraseSet method.

Fields
name

string

Required. The name of the PhraseSet to retrieve. The expected format is projects/{project}/locations/{location}/phraseSets/{phrase_set}.

GetRecognizerRequest

Request message for the GetRecognizer method.

Fields
name

string

Required. The name of the Recognizer to retrieve. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}.

InlineOutputConfig

This type has no fields.

Output configurations for inline response.

InlineResult

Final results returned inline in the recognition response.

Fields
transcript

BatchRecognizeResults

The transcript for the audio file.

vtt_captions

string

The transcript for the audio file as VTT formatted captions. This is populated only when VTT output is requested.

srt_captions

string

The transcript for the audio file as SRT formatted captions. This is populated only when SRT output is requested.

LanguageMetadata

The metadata about locales available in a given region. Currently this is just the models that are available for each locale

Fields
models

map<string, ModelMetadata>

Map of locale (language code) -> models

ListCustomClassesRequest

Request message for the ListCustomClasses method.

Fields
parent

string

Required. The project and location of CustomClass resources to list. The expected format is projects/{project}/locations/{location}.

page_size

int32

Number of results per requests. A valid page_size ranges from 0 to 100 inclusive. If the page_size is zero or unspecified, a page size of 5 will be chosen. If the page size exceeds 100, it will be coerced down to 100. Note that a call might return fewer results than the requested page size.

page_token

string

A page token, received from a previous ListCustomClasses call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListCustomClasses must match the call that provided the page token.

show_deleted

bool

Whether, or not, to show resources that have been deleted.

ListCustomClassesResponse

Response message for the ListCustomClasses method.

Fields
custom_classes[]

CustomClass

The list of requested CustomClasses.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

ListPhraseSetsRequest

Request message for the ListPhraseSets method.

Fields
parent

string

Required. The project and location of PhraseSet resources to list. The expected format is projects/{project}/locations/{location}.

page_size

int32

The maximum number of PhraseSets to return. The service may return fewer than this value. If unspecified, at most 5 PhraseSets will be returned. The maximum value is 100; values above 100 will be coerced to 100.

page_token

string

A page token, received from a previous ListPhraseSets call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListPhraseSets must match the call that provided the page token.

show_deleted

bool

Whether, or not, to show resources that have been deleted.

ListPhraseSetsResponse

Response message for the ListPhraseSets method.

Fields
phrase_sets[]

PhraseSet

The list of requested PhraseSets.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

ListRecognizersRequest

Request message for the ListRecognizers method.

Fields
parent

string

Required. The project and location of Recognizers to list. The expected format is projects/{project}/locations/{location}.

page_size

int32

The maximum number of Recognizers to return. The service may return fewer than this value. If unspecified, at most 5 Recognizers will be returned. The maximum value is 100; values above 100 will be coerced to 100.

page_token

string

A page token, received from a previous ListRecognizers call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListRecognizers must match the call that provided the page token.

show_deleted

bool

Whether, or not, to show resources that have been deleted.

ListRecognizersResponse

Response message for the ListRecognizers method.

Fields
recognizers[]

Recognizer

The list of requested Recognizers.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages. This token expires after 72 hours.

LocationsMetadata

Main metadata for the Locations API for STT V2. Currently this is just the metadata about locales, models, and features

Fields
languages

LanguageMetadata

Information about available locales, models, and features represented in the hierarchical structure of locales -> models -> features

access_metadata

AccessMetadata

Information about access metadata for the region and given project.

ModelFeature

Representes a singular feature of a model. If the feature is recognizer, the release_state of the feature represents the release_state of the model

Fields
feature

string

The name of the feature (Note: the feature can be recognizer)

release_state

string

The release state of the feature

ModelFeatures

Represents the collection of features belonging to a model

Fields
model_feature[]

ModelFeature

Repeated field that contains all features of the model

ModelMetadata

The metadata about the models in a given region for a specific locale. Currently this is just the features of the model

Fields
model_features

map<string, ModelFeatures>

Map of the model name -> features of that model

NativeOutputFileFormatConfig

This type has no fields.

Output configurations for serialized BatchRecognizeResults protos.

OperationMetadata

Represents the metadata of a long-running operation.

Fields
create_time

Timestamp

The time the operation was created.

update_time

Timestamp

The time the operation was last updated.

resource

string

The resource path for the target of the operation.

method

string

The method that triggered the operation.

kms_key_name

string

The KMS key name with which the content of the Operation is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

kms_key_version_name

string

The KMS key version name with which content of the Operation is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

progress_percent

int32

The percent progress of the Operation. Values can range from 0-100. If the value is 100, then the operation is finished.

Union field request. The request that spawned the Operation. request can be only one of the following:
batch_recognize_request

BatchRecognizeRequest

The BatchRecognizeRequest that spawned the Operation.

create_recognizer_request

CreateRecognizerRequest

The CreateRecognizerRequest that spawned the Operation.

update_recognizer_request

UpdateRecognizerRequest

The UpdateRecognizerRequest that spawned the Operation.

delete_recognizer_request

DeleteRecognizerRequest

The DeleteRecognizerRequest that spawned the Operation.

undelete_recognizer_request

UndeleteRecognizerRequest

The UndeleteRecognizerRequest that spawned the Operation.

create_custom_class_request

CreateCustomClassRequest

The CreateCustomClassRequest that spawned the Operation.

update_custom_class_request

UpdateCustomClassRequest

The UpdateCustomClassRequest that spawned the Operation.

delete_custom_class_request

DeleteCustomClassRequest

The DeleteCustomClassRequest that spawned the Operation.

undelete_custom_class_request

UndeleteCustomClassRequest

The UndeleteCustomClassRequest that spawned the Operation.

create_phrase_set_request

CreatePhraseSetRequest

The CreatePhraseSetRequest that spawned the Operation.

update_phrase_set_request

UpdatePhraseSetRequest

The UpdatePhraseSetRequest that spawned the Operation.

delete_phrase_set_request

DeletePhraseSetRequest

The DeletePhraseSetRequest that spawned the Operation.

undelete_phrase_set_request

UndeletePhraseSetRequest

The UndeletePhraseSetRequest that spawned the Operation.

update_config_request
(deprecated)

UpdateConfigRequest

The UpdateConfigRequest that spawned the Operation.

Union field metadata. Specific metadata per RPC. metadata can be only one of the following:
batch_recognize_metadata

BatchRecognizeMetadata

Metadata specific to the BatchRecognize method.

OutputFormatConfig

Configuration for the format of the results stored to output.

Fields
native

NativeOutputFileFormatConfig

Configuration for the native output format. If this field is set, or if no other output format field is set then transcripts will be written to the sink in the native format.

vtt

VttOutputFileFormatConfig

Configuration for the VTT output format. If this field is set, then transcripts will be written to the sink in the VTT format.

srt

SrtOutputFileFormatConfig

Configuration for the SRT output format. If this field is set, then transcripts will be written to the sink in the SRT format.

PhraseSet

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.

Fields
name

string

Output only. Identifier. The resource name of the PhraseSet. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}.

uid

string

Output only. System-assigned unique identifier for the PhraseSet.

phrases[]

Phrase

A list of word and phrases.

boost

float

Hint Boost. Positive value will increase the probability that a specific phrase will be recognized over other similar sounding phrases. The higher the boost, the higher the chance of false positive recognition as well. Valid boost values are between 0 (exclusive) and 20. We recommend using a binary search approach to finding the optimal value for your use case as well as adding phrases both with and without boost to your requests.

display_name

string

User-settable, human-readable name for the PhraseSet. Must be 63 characters or less.

state

State

Output only. The PhraseSet lifecycle state.

create_time

Timestamp

Output only. Creation time.

update_time

Timestamp

Output only. The most recent time this resource was modified.

delete_time

Timestamp

Output only. The time at which this resource was requested for deletion.

expire_time

Timestamp

Output only. The time at which this resource will be purged.

annotations

map<string, string>

Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

etag

string

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

reconciling

bool

Output only. Whether or not this PhraseSet is in the process of being updated.

kms_key_name

string

Output only. The KMS key name with which the PhraseSet is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

kms_key_version_name

string

Output only. The KMS key version name with which the PhraseSet is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

Phrase

A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.

Fields
value

string

The phrase itself.

boost

float

Hint Boost. Overrides the boost set at the phrase set level. Positive value will increase the probability that a specific phrase will be recognized over other similar sounding phrases. The higher the boost, the higher the chance of false positive recognition as well. Negative boost values would correspond to anti-biasing. Anti-biasing is not enabled, so negative boost values will return an error. Boost values must be between 0 and 20. Any values outside that range will return an error. We recommend using a binary search approach to finding the optimal value for your use case as well as adding phrases both with and without boost to your requests.

State

Set of states that define the lifecycle of a PhraseSet.

Enums
STATE_UNSPECIFIED Unspecified state. This is only used/useful for distinguishing unset values.
ACTIVE The normal and active state.
DELETED This PhraseSet has been deleted.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

Fields
model

string

Optional. Which model to use for recognition requests. Select the model best suited to your domain to get best results.

Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.

language_codes[]

string

Optional. The language of the supplied audio as a BCP-47 language tag. Language tags are normalized to BCP-47 before they are used eg "en-us" becomes "en-US".

Supported languages for each model are listed in the Table of Supported Models.

If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio.

features

RecognitionFeatures

Speech recognition features to enable.

adaptation

SpeechAdaptation

Speech adaptation context that weights recognizer predictions for specific words and phrases.

transcript_normalization

TranscriptNormalization

Optional. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Union field decoding_config. Decoding parameters for audio being sent for recognition. decoding_config can be only one of the following:
auto_decoding_config

AutoDetectDecodingConfig

Automatically detect decoding parameters. Preferred for supported formats.

explicit_decoding_config

ExplicitDecodingConfig

Explicitly specified decoding parameters. Required if using headerless PCM audio (linear16, mulaw, alaw).

RecognitionFeatures

Available recognition features.

Fields
profanity_filter

bool

If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, for instance, "f***". If set to false or omitted, profanities won't be filtered out.

enable_word_time_offsets

bool

If true, the top result includes a list of words and the start and end time offsets (timestamps) for those words. If false, no word-level time offset information is returned. The default is false.

enable_word_confidence

bool

If true, the top result includes a list of words and the confidence for those words. If false, no word-level confidence information is returned. The default is false.

enable_automatic_punctuation

bool

If true, adds punctuation to recognition result hypotheses. This feature is only available in select languages. The default false value does not add punctuation to result hypotheses.

enable_spoken_punctuation

bool

The spoken punctuation behavior for the call. If true, replaces spoken punctuation with the corresponding symbols in the request. For example, "how are you question mark" becomes "how are you?". See https://cloud.google.com/speech-to-text/docs/spoken-punctuation for support. If false, spoken punctuation is not replaced.

enable_spoken_emojis

bool

The spoken emoji behavior for the call. If true, adds spoken emoji formatting for the request. This will replace spoken emojis with the corresponding Unicode symbols in the final transcript. If false, spoken emojis are not replaced.

multi_channel_mode

MultiChannelMode

Mode for recognizing multi-channel audio.

diarization_config

SpeakerDiarizationConfig

Configuration to enable speaker diarization and set additional parameters to make diarization better suited for your application. When this is enabled, we send all the words from the beginning of the audio for the top alternative in every consecutive STREAMING responses. This is done in order to improve our speaker tags as our models learn to identify the speakers in the conversation over time. For non-streaming requests, the diarization results will be provided only in the top alternative of the FINAL SpeechRecognitionResult.

max_alternatives

int32

Maximum number of recognition hypotheses to be returned. The server may return fewer than max_alternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of one. If omitted, will return a maximum of one.

MultiChannelMode

Options for how to recognize multi-channel audio.

Enums
MULTI_CHANNEL_MODE_UNSPECIFIED Default value for the multi-channel mode. If the audio contains multiple channels, only the first channel will be transcribed; other channels will be ignored.
SEPARATE_RECOGNITION_PER_CHANNEL If selected, each channel in the provided audio is transcribed independently. This cannot be selected if the selected model is latest_short.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

Fields
output_format_config

OutputFormatConfig

Optional. Configuration for the format of the results stored to output. If unspecified transcripts will be written in the NATIVE format only.

Union field output.

output can be only one of the following:

gcs_output_config

GcsOutputConfig

If this message is populated, recognition results are written to the provided Google Cloud Storage URI.

inline_response_config

InlineOutputConfig

If this message is populated, recognition results are provided in the BatchRecognizeResponse message of the Operation when completed. This is only supported when calling BatchRecognize with just one audio file.

RecognitionResponseMetadata

Metadata about the recognition request and response.

Fields
total_billed_duration

Duration

When available, billed audio seconds for the corresponding request.

RecognizeRequest

Request message for the Recognize method. Either content or uri must be supplied. Supplying both or neither returns INVALID_ARGUMENT. See content limits.

Fields
recognizer

string

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

config

RecognitionConfig

Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the config_mask field can be used to override parts of the default_recognition_config of the Recognizer resource.

config_mask

FieldMask

The list of fields in config that override the values in the default_recognition_config of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in config override the values in the recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

Union field audio_source. The audio source, which is either inline content or a Google Cloud Storage URI. audio_source can be only one of the following:
content

bytes

The audio data bytes encoded as specified in RecognitionConfig. As with all bytes fields, proto buffers use a pure binary representation, whereas JSON representations use base64.

uri

string

URI that points to a file that contains audio data bytes as specified in RecognitionConfig. The file must not be compressed (for example, gzip). Currently, only Google Cloud Storage URIs are supported, which must be specified in the following format: gs://bucket_name/object_name (other URI formats return INVALID_ARGUMENT). For more information, see Request URIs.

RecognizeResponse

Response message for the Recognize method.

Fields
results[]

SpeechRecognitionResult

Sequential list of transcription results corresponding to sequential portions of audio.

metadata

RecognitionResponseMetadata

Metadata about the recognition.

Recognizer

A Recognizer message. Stores recognition configuration and metadata.

Fields
name

string

Output only. Identifier. The resource name of the Recognizer. Format: projects/{project}/locations/{location}/recognizers/{recognizer}.

uid

string

Output only. System-assigned unique identifier for the Recognizer.

display_name

string

User-settable, human-readable name for the Recognizer. Must be 63 characters or less.

model
(deprecated)

string

Optional. This field is now deprecated. Prefer the model field in the RecognitionConfig message.

Which model to use for recognition requests. Select the model best suited to your domain to get best results.

Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.

language_codes[]
(deprecated)

string

Optional. This field is now deprecated. Prefer the language_codes field in the RecognitionConfig message.

The language of the supplied audio as a BCP-47 language tag.

Supported languages for each model are listed in the Table of Supported Models.

If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio. When you create or update a Recognizer, these values are stored in normalized BCP-47 form. For example, "en-us" is stored as "en-US".

default_recognition_config

RecognitionConfig

Default configuration to use for requests with this Recognizer. This can be overwritten by inline configuration in the RecognizeRequest.config field.

annotations

map<string, string>

Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations.

state

State

Output only. The Recognizer lifecycle state.

create_time

Timestamp

Output only. Creation time.

update_time

Timestamp

Output only. The most recent time this Recognizer was modified.

delete_time

Timestamp

Output only. The time at which this Recognizer was requested for deletion.

expire_time

Timestamp

Output only. The time at which this Recognizer will be purged.

etag

string

Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

reconciling

bool

Output only. Whether or not this Recognizer is in the process of being updated.

kms_key_name

string

Output only. The KMS key name with which the Recognizer is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

kms_key_version_name

string

Output only. The KMS key version name with which the Recognizer is encrypted. The expected format is projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}.

State

Set of states that define the lifecycle of a Recognizer.

Enums
STATE_UNSPECIFIED The default value. This value is used if the state is omitted.
ACTIVE The Recognizer is active and ready for use.
DELETED This Recognizer has been deleted.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

Fields
min_speaker_count

int32

Required. Minimum number of speakers in the conversation. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers.

To fix the number of speakers detected in the audio, set min_speaker_count = max_speaker_count.

max_speaker_count

int32

Required. Maximum number of speakers in the conversation. Valid values are: 1-6. Must be >= min_speaker_count. This range gives you more flexibility by allowing the system to automatically determine the correct number of speakers.

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

Fields
phrase_sets[]

AdaptationPhraseSet

A list of inline or referenced PhraseSets.

custom_classes[]

CustomClass

A list of inline CustomClasses. Existing CustomClass resources can be referenced directly in a PhraseSet.

AdaptationPhraseSet

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

Fields

Union field value.

value can be only one of the following:

phrase_set

string

The name of an existing PhraseSet resource. The user must have read access to the resource and it must not be deleted.

inline_phrase_set

PhraseSet

An inline defined PhraseSet.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

Fields
transcript

string

Transcript text representing the words that the user spoke.

confidence

float

The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where is_final is set to true. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

words[]

WordInfo

A list of word-specific information for each recognized word. When the SpeakerDiarizationConfig is set, you will see all the words from the beginning of the audio.

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

Fields
alternatives[]

SpeechRecognitionAlternative

May contain one or more recognition hypotheses. These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

channel_tag

int32

For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from 1 to N.

result_end_offset

Duration

Time offset of the end of this result relative to the beginning of the audio.

language_code

string

Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.

SrtOutputFileFormatConfig

This type has no fields.

Output configurations SubRip Text formatted subtitle file.

StreamingRecognitionConfig

Provides configuration information for the StreamingRecognize request.

Fields
config

RecognitionConfig

Required. Features and audio metadata to use for the Automatic Speech Recognition. This field in combination with the config_mask field can be used to override parts of the default_recognition_config of the Recognizer resource.

config_mask

FieldMask

The list of fields in config that override the values in the default_recognition_config of the recognizer during this recognition request. If no mask is provided, all non-default valued fields in config override the values in the Recognizer for this recognition request. If a mask is provided, only the fields listed in the mask override the config in the Recognizer for this recognition request. If a wildcard (*) is provided, config completely overrides and replaces the config in the recognizer for this recognition request.

streaming_features

StreamingRecognitionFeatures

Speech recognition features to enable specific to streaming audio recognition requests.

StreamingRecognitionFeatures

Available recognition features specific to streaming recognition requests.

Fields
enable_voice_activity_events

bool

If true, responses with voice activity speech events will be returned as they are detected.

interim_results

bool

Whether or not to stream interim results to the client. If set to true, interim results will be streamed to the client. Otherwise, only the final response will be streamed back.

voice_activity_timeout

VoiceActivityTimeout

If set, the server will automatically close the stream after the specified duration has elapsed after the last VOICE_ACTIVITY speech event has been sent. The field voice_activity_events must also be set to true.

VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

Fields
speech_start_timeout

Duration

Duration to timeout the stream if no speech begins. If this is set and no speech is detected in this duration at the start of the stream, the server will close the stream.

speech_end_timeout

Duration

Duration to timeout the stream after speech ends. If this is set and no speech is detected in this duration after speech was detected, the server will close the stream.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

Fields
alternatives[]

SpeechRecognitionAlternative

May contain one or more recognition hypotheses. These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

is_final

bool

If false, this StreamingRecognitionResult represents an interim result that may change. If true, this is the final time the speech service will return this particular StreamingRecognitionResult, the recognizer will not return any further hypotheses for this portion of the transcript and corresponding audio.

stability

float

An estimate of the likelihood that the recognizer will not change its guess about this interim result. Values range from 0.0 (completely unstable) to 1.0 (completely stable). This field is only provided for interim results (is_final=false). The default of 0.0 is a sentinel value indicating stability was not set.

result_end_offset

Duration

Time offset of the end of this result relative to the beginning of the audio.

channel_tag

int32

For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from 1 to N.

language_code

string

Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.

StreamingRecognizeRequest

Request message for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent in one call.

If the Recognizer referenced by recognizer contains a fully specified request configuration then the stream may only contain messages with only audio set.

Otherwise the first message must contain a recognizer and a streaming_config message that together fully specify the request configuration and must not contain audio. All subsequent messages must only have audio set.

Fields
recognizer

string

Required. The name of the Recognizer to use during recognition. The expected format is projects/{project}/locations/{location}/recognizers/{recognizer}. The {recognizer} segment may be set to _ to use an empty implicit Recognizer.

Union field streaming_request.

streaming_request can be only one of the following:

streaming_config

StreamingRecognitionConfig

StreamingRecognitionConfig to be used in this recognition attempt. If provided, it will override the default RecognitionConfig stored in the Recognizer.

audio

bytes

Inline audio bytes to be Recognized. Maximum size for this field is 15 KB per request.

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponses that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

Fields
results[]

StreamingRecognitionResult

This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one is_final=true result (the newly settled portion), followed by zero or more is_final=false results (the interim results).

speech_event_type

SpeechEventType

Indicates the type of speech event.

speech_event_offset

Duration

Time offset between the beginning of the audio and event emission.

metadata

RecognitionResponseMetadata

Metadata about the recognition.

SpeechEventType

Indicates the type of speech event.

Enums
SPEECH_EVENT_TYPE_UNSPECIFIED No speech event specified.
END_OF_SINGLE_UTTERANCE This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio and will close the gRPC bidirectional stream. This event is only sent if there was a force cutoff due to silence being detected early. This event is only available through the latest_short model.
SPEECH_ACTIVITY_BEGIN This event indicates that the server has detected the beginning of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true.
SPEECH_ACTIVITY_END This event indicates that the server has detected the end of human voice activity in the stream. This event can be returned multiple times if speech starts and stops repeatedly throughout the stream. This event is only sent if voice_activity_events is set to true.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

Fields
entries[]

Entry

A list of replacement entries. We will perform replacement with one entry at a time. For example, the second entry in ["cat" => "dog", "mountain cat" => "mountain dog"] will never be applied because we will always process the first entry before it. At most 100 entries.

Entry

A single replacement configuration.

Fields
search

string

What to replace. Max length is 100 characters.

replace

string

What to replace with. Max length is 100 characters.

case_sensitive

bool

Whether the search is case sensitive.

UndeleteCustomClassRequest

Request message for the UndeleteCustomClass method.

Fields
name

string

Required. The name of the CustomClass to undelete. Format: projects/{project}/locations/{location}/customClasses/{custom_class}

validate_only

bool

If set, validate the request and preview the undeleted CustomClass, but do not actually undelete it.

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

UndeletePhraseSetRequest

Request message for the UndeletePhraseSet method.

Fields
name

string

Required. The name of the PhraseSet to undelete. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}

validate_only

bool

If set, validate the request and preview the undeleted PhraseSet, but do not actually undelete it.

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

UndeleteRecognizerRequest

Request message for the UndeleteRecognizer method.

Fields
name

string

Required. The name of the Recognizer to undelete. Format: projects/{project}/locations/{location}/recognizers/{recognizer}

validate_only

bool

If set, validate the request and preview the undeleted Recognizer, but do not actually undelete it.

etag

string

This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.

UpdateConfigRequest

Request message for the UpdateConfig method.

Fields
config

Config

Required. The config to update.

The config's name field is used to identify the config to be updated. The expected format is projects/{project}/locations/{location}/config.

update_mask

FieldMask

The list of fields to be updated.

UpdateCustomClassRequest

Request message for the UpdateCustomClass method.

Fields
custom_class

CustomClass

Required. The CustomClass to update.

The CustomClass's name field is used to identify the CustomClass to update. Format: projects/{project}/locations/{location}/customClasses/{custom_class}.

update_mask

FieldMask

The list of fields to be updated. If empty, all fields are considered for update.

validate_only

bool

If set, validate the request and preview the updated CustomClass, but do not actually update it.

UpdatePhraseSetRequest

Request message for the UpdatePhraseSet method.

Fields
phrase_set

PhraseSet

Required. The PhraseSet to update.

The PhraseSet's name field is used to identify the PhraseSet to update. Format: projects/{project}/locations/{location}/phraseSets/{phrase_set}.

update_mask

FieldMask

The list of fields to update. If empty, all non-default valued fields are considered for update. Use * to update the entire PhraseSet resource.

validate_only

bool

If set, validate the request and preview the updated PhraseSet, but do not actually update it.

UpdateRecognizerRequest

Request message for the UpdateRecognizer method.

Fields
recognizer

Recognizer

Required. The Recognizer to update.

The Recognizer's name field is used to identify the Recognizer to update. Format: projects/{project}/locations/{location}/recognizers/{recognizer}.

update_mask

FieldMask

The list of fields to update. If empty, all non-default valued fields are considered for update. Use * to update the entire Recognizer resource.

validate_only

bool

If set, validate the request and preview the updated Recognizer, but do not actually update it.

VttOutputFileFormatConfig

This type has no fields.

Output configurations for WebVTT formatted subtitle file.

WordInfo

Word-specific information for recognized words.

Fields
start_offset

Duration

Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enable_word_time_offsets is true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

end_offset

Duration

Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if enable_word_time_offsets is true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

word

string

The word corresponding to this set of information.

confidence

float

The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where is_final is set to true. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.

speaker_label

string

A distinct label is assigned for every speaker within the audio. This field specifies which one of those speakers was detected to have spoken this word. speaker_label is set if SpeakerDiarizationConfig is given and only in the top alternative.