Classes
AccessMetadata
The access metadata for a particular region. This can be applied if the org policy for the given project disallows a particular region.
AccessMetadata.Types
Container for nested types declared in the AccessMetadata message type.
AutoDetectDecodingConfig
Automatically detected decoding parameters. Supported for the following encodings:
WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.
WAV_MULAW: 8-bit companded mulaw samples in a WAV container.
WAV_ALAW: 8-bit companded alaw samples in a WAV container.
RFC4867_5_AMR: AMR frames with an rfc4867.5 header.
RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.
FLAC: FLAC frames in the "native FLAC" container format.
MP3: MPEG audio frames with optional (ignored) ID3 metadata.
OGG_OPUS: Opus audio frames in an Ogg container.
WEBM_OPUS: Opus audio frames in a WebM container.
MP4_AAC: AAC audio frames in an MP4 container.
M4A_AAC: AAC audio frames in an M4A container.
MOV_AAC: AAC audio frames in an MOV container.
BatchRecognizeFileMetadata
Metadata about a single file in a batch for BatchRecognize.
BatchRecognizeFileResult
Final results for a single file.
BatchRecognizeMetadata
Operation metadata for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize].
BatchRecognizeRequest
Request message for the [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] method.
BatchRecognizeRequest.Types
Container for nested types declared in the BatchRecognizeRequest message type.
BatchRecognizeResponse
Response message for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] that is packaged into a longrunning [Operation][google.longrunning.Operation].
BatchRecognizeResults
Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.
BatchRecognizeTranscriptionMetadata
Metadata about transcription for a single file (for example, progress percent).
CloudStorageResult
Final results written to Cloud Storage.
Config
Message representing the config for the Speech-to-Text API. This includes an optional KMS key with which incoming data will be encrypted.
ConfigName
Resource name for the Config
resource.
CreateCustomClassRequest
Request message for the [CreateCustomClass][google.cloud.speech.v2.Speech.CreateCustomClass] method.
CreatePhraseSetRequest
Request message for the [CreatePhraseSet][google.cloud.speech.v2.Speech.CreatePhraseSet] method.
CreateRecognizerRequest
Request message for the [CreateRecognizer][google.cloud.speech.v2.Speech.CreateRecognizer] method.
CryptoKeyName
Resource name for the CryptoKey
resource.
CryptoKeyVersionName
Resource name for the CryptoKeyVersion
resource.
CustomClass
CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.
CustomClass.Types
Container for nested types declared in the CustomClass message type.
CustomClass.Types.ClassItem
An item of the class.
CustomClassName
Resource name for the CustomClass
resource.
DeleteCustomClassRequest
Request message for the [DeleteCustomClass][google.cloud.speech.v2.Speech.DeleteCustomClass] method.
DeletePhraseSetRequest
Request message for the [DeletePhraseSet][google.cloud.speech.v2.Speech.DeletePhraseSet] method.
DeleteRecognizerRequest
Request message for the [DeleteRecognizer][google.cloud.speech.v2.Speech.DeleteRecognizer] method.
ExplicitDecodingConfig
Explicitly specified decoding parameters.
ExplicitDecodingConfig.Types
Container for nested types declared in the ExplicitDecodingConfig message type.
GcsOutputConfig
Output configurations for Cloud Storage.
GetConfigRequest
Request message for the [GetConfig][google.cloud.speech.v2.Speech.GetConfig] method.
GetCustomClassRequest
Request message for the [GetCustomClass][google.cloud.speech.v2.Speech.GetCustomClass] method.
GetPhraseSetRequest
Request message for the [GetPhraseSet][google.cloud.speech.v2.Speech.GetPhraseSet] method.
GetRecognizerRequest
Request message for the [GetRecognizer][google.cloud.speech.v2.Speech.GetRecognizer] method.
InlineOutputConfig
Output configurations for inline response.
InlineResult
Final results returned inline in the recognition response.
LanguageMetadata
The metadata about locales available in a given region. Currently this is just the models that are available for each locale
ListCustomClassesRequest
Request message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.
ListCustomClassesResponse
Response message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.
ListPhraseSetsRequest
Request message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.
ListPhraseSetsResponse
Response message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.
ListRecognizersRequest
Request message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.
ListRecognizersResponse
Response message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.
LocationsMetadata
Main metadata for the Locations API for STT V2. Currently this is just the metadata about locales, models, and features
ModelFeature
Representes a singular feature of a model. If the feature is recognizer
,
the release_state of the feature represents the release_state of the model
ModelFeatures
Represents the collection of features belonging to a model
ModelMetadata
The metadata about the models in a given region for a specific locale. Currently this is just the features of the model
NativeOutputFileFormatConfig
Output configurations for serialized BatchRecognizeResults
protos.
OperationMetadata
Represents the metadata of a long-running operation.
OutputFormatConfig
Configuration for the format of the results stored to output
.
PhraseSet
PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.
PhraseSet.Types
Container for nested types declared in the PhraseSet message type.
PhraseSet.Types.Phrase
A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.
List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.
PhraseSetName
Resource name for the PhraseSet
resource.
RecognitionConfig
Provides information to the Recognizer that specifies how to process the recognition request.
RecognitionFeatures
Available recognition features.
RecognitionFeatures.Types
Container for nested types declared in the RecognitionFeatures message type.
RecognitionOutputConfig
Configuration options for the output(s) of recognition.
RecognitionResponseMetadata
Metadata about the recognition request and response.
RecognizeRequest
Request message for the
[Recognize][google.cloud.speech.v2.Speech.Recognize] method. Either
content
or uri
must be supplied. Supplying both or neither returns
[INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content
limits.
RecognizeResponse
Response message for the [Recognize][google.cloud.speech.v2.Speech.Recognize] method.
Recognizer
A Recognizer message. Stores recognition configuration and metadata.
Recognizer.Types
Container for nested types declared in the Recognizer message type.
RecognizerName
Resource name for the Recognizer
resource.
SpeakerDiarizationConfig
Configuration to enable speaker diarization.
Speech
Enables speech transcription and resource management.
Speech.SpeechBase
Base class for server-side implementations of Speech
Speech.SpeechClient
Client for Speech
SpeechAdaptation
Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.
SpeechAdaptation.Types
Container for nested types declared in the SpeechAdaptation message type.
SpeechAdaptation.Types.AdaptationPhraseSet
A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.
SpeechClient
Speech client wrapper, for convenient use.
SpeechClient.StreamingRecognizeStream
Bidirectional streaming methods for StreamingRecognize(CallSettings, BidirectionalStreamingSettings).
SpeechClientBuilder
Builder class for SpeechClient to provide simple configuration of credentials, endpoint etc.
SpeechClientImpl
Speech client wrapper implementation, for convenient use.
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
SpeechRecognitionResult
A speech recognition result corresponding to a portion of the audio.
SpeechSettings
Settings for SpeechClient instances.
SrtOutputFileFormatConfig
Output configurations SubRip Text formatted subtitle file.
StreamingRecognitionConfig
Provides configuration information for the StreamingRecognize request.
StreamingRecognitionFeatures
Available recognition features specific to streaming recognition requests.
StreamingRecognitionFeatures.Types
Container for nested types declared in the StreamingRecognitionFeatures message type.
StreamingRecognitionFeatures.Types.VoiceActivityTimeout
Events that a timeout can be set on for voice activity.
StreamingRecognitionResult
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
StreamingRecognizeRequest
Request message for the [StreamingRecognize][google.cloud.speech.v2.Speech.StreamingRecognize] method. Multiple [StreamingRecognizeRequest][google.cloud.speech.v2.StreamingRecognizeRequest] messages are sent in one call.
If the [Recognizer][google.cloud.speech.v2.Recognizer] referenced by [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] contains a fully specified request configuration then the stream may only contain messages with only [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.
Otherwise the first message must contain a [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] and a [streaming_config][google.cloud.speech.v2.StreamingRecognizeRequest.streaming_config] message that together fully specify the request configuration and must not contain [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio]. All subsequent messages must only have [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.
StreamingRecognizeResponse
StreamingRecognizeResponse
is the only message returned to the client by
StreamingRecognize
. A series of zero or more StreamingRecognizeResponse
messages are streamed back to the client. If there is no recognizable
audio then no messages are streamed back to the client.
Here are some examples of StreamingRecognizeResponse
s that might
be returned while processing audio:
results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }
Notes:
Only two of the above responses #4 and #7 contain final results; they are indicated by
is_final: true
. Concatenating these together generates the full transcript: "to be or not to be that is the question".The others contain interim
results
. #3 and #6 contain two interimresults
: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults
.The specific
stability
andconfidence
values shown above are only for illustrative purposes. Actual values may vary.In each response, only one of these fields will be set:
error
,speech_event_type
, or one or more (repeated)results
.
StreamingRecognizeResponse.Types
Container for nested types declared in the StreamingRecognizeResponse message type.
TranscriptNormalization
Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.
TranscriptNormalization.Types
Container for nested types declared in the TranscriptNormalization message type.
TranscriptNormalization.Types.Entry
A single replacement configuration.
TranslationConfig
Translation configuration. Use to translate the given audio into text for the desired language.
UndeleteCustomClassRequest
Request message for the [UndeleteCustomClass][google.cloud.speech.v2.Speech.UndeleteCustomClass] method.
UndeletePhraseSetRequest
Request message for the [UndeletePhraseSet][google.cloud.speech.v2.Speech.UndeletePhraseSet] method.
UndeleteRecognizerRequest
Request message for the [UndeleteRecognizer][google.cloud.speech.v2.Speech.UndeleteRecognizer] method.
UpdateConfigRequest
Request message for the [UpdateConfig][google.cloud.speech.v2.Speech.UpdateConfig] method.
UpdateCustomClassRequest
Request message for the [UpdateCustomClass][google.cloud.speech.v2.Speech.UpdateCustomClass] method.
UpdatePhraseSetRequest
Request message for the [UpdatePhraseSet][google.cloud.speech.v2.Speech.UpdatePhraseSet] method.
UpdateRecognizerRequest
Request message for the [UpdateRecognizer][google.cloud.speech.v2.Speech.UpdateRecognizer] method.
VttOutputFileFormatConfig
Output configurations for WebVTT formatted subtitle file.
WordInfo
Word-specific information for recognized words.
Enums
AccessMetadata.Types.ConstraintType
Describes the different types of constraints that can be applied on a region.
BatchRecognizeFileMetadata.AudioSourceOneofCase
Enum of possible cases for the "audio_source" oneof.
BatchRecognizeFileResult.ResultOneofCase
Enum of possible cases for the "result" oneof.
BatchRecognizeRequest.Types.ProcessingStrategy
Possible processing strategies for batch requests.
ConfigName.ResourceNameType
The possible contents of ConfigName.
CryptoKeyName.ResourceNameType
The possible contents of CryptoKeyName.
CryptoKeyVersionName.ResourceNameType
The possible contents of CryptoKeyVersionName.
CustomClass.Types.State
Set of states that define the lifecycle of a CustomClass.
CustomClassName.ResourceNameType
The possible contents of CustomClassName.
ExplicitDecodingConfig.Types.AudioEncoding
Supported audio data encodings.
OperationMetadata.MetadataOneofCase
Enum of possible cases for the "metadata" oneof.
OperationMetadata.RequestOneofCase
Enum of possible cases for the "request" oneof.
PhraseSet.Types.State
Set of states that define the lifecycle of a PhraseSet.
PhraseSetName.ResourceNameType
The possible contents of PhraseSetName.
RecognitionConfig.DecodingConfigOneofCase
Enum of possible cases for the "decoding_config" oneof.
RecognitionFeatures.Types.MultiChannelMode
Options for how to recognize multi-channel audio.
RecognitionOutputConfig.OutputOneofCase
Enum of possible cases for the "output" oneof.
RecognizeRequest.AudioSourceOneofCase
Enum of possible cases for the "audio_source" oneof.
Recognizer.Types.State
Set of states that define the lifecycle of a Recognizer.
RecognizerName.ResourceNameType
The possible contents of RecognizerName.
SpeechAdaptation.Types.AdaptationPhraseSet.ValueOneofCase
Enum of possible cases for the "value" oneof.
StreamingRecognizeRequest.StreamingRequestOneofCase
Enum of possible cases for the "streaming_request" oneof.
StreamingRecognizeResponse.Types.SpeechEventType
Indicates the type of speech event.