Cloud Speech-to-Text v2 API - Namespace Google.Cloud.Speech.V2 (1.1.0)

Classes

AutoDetectDecodingConfig

Automatically detected decoding parameters. Supported for the following encodings:

  • WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.

  • WAV_MULAW: 8-bit companded mulaw samples in a WAV container.

  • WAV_ALAW: 8-bit companded alaw samples in a WAV container.

  • RFC4867_5_AMR: AMR frames with an rfc4867.5 header.

  • RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.

  • FLAC: FLAC frames in the "native FLAC" container format.

  • MP3: MPEG audio frames with optional (ignored) ID3 metadata.

  • OGG_OPUS: Opus audio frames in an Ogg container.

  • WEBM_OPUS: Opus audio frames in a WebM container.

  • M4A: M4A audio format.

BatchRecognizeFileMetadata

Metadata about a single file in a batch for BatchRecognize.

BatchRecognizeFileResult

Final results for a single file.

BatchRecognizeMetadata

Operation metadata for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize].

BatchRecognizeRequest

Request message for the [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] method.

BatchRecognizeRequest.Types

Container for nested types declared in the BatchRecognizeRequest message type.

BatchRecognizeResponse

Response message for [BatchRecognize][google.cloud.speech.v2.Speech.BatchRecognize] that is packaged into a longrunning [Operation][google.longrunning.Operation].

BatchRecognizeResults

Output type for Cloud Storage of BatchRecognize transcripts. Though this proto isn't returned in this API anywhere, the Cloud Storage transcripts will be this proto serialized and should be parsed as such.

BatchRecognizeTranscriptionMetadata

Metadata about transcription for a single file (for example, progress percent).

CloudStorageResult

Final results written to Cloud Storage.

Config

Message representing the config for the Speech-to-Text API. This includes an optional KMS key with which incoming data will be encrypted.

ConfigName

Resource name for the Config resource.

CreateCustomClassRequest

Request message for the [CreateCustomClass][google.cloud.speech.v2.Speech.CreateCustomClass] method.

CreatePhraseSetRequest

Request message for the [CreatePhraseSet][google.cloud.speech.v2.Speech.CreatePhraseSet] method.

CreateRecognizerRequest

Request message for the [CreateRecognizer][google.cloud.speech.v2.Speech.CreateRecognizer] method.

CryptoKeyName

Resource name for the CryptoKey resource.

CryptoKeyVersionName

Resource name for the CryptoKeyVersion resource.

CustomClass

CustomClass for biasing in speech recognition. Used to define a set of words or phrases that represents a common concept or theme likely to appear in your audio, for example a list of passenger ship names.

CustomClass.Types

Container for nested types declared in the CustomClass message type.

CustomClass.Types.ClassItem

An item of the class.

CustomClassName

Resource name for the CustomClass resource.

DeleteCustomClassRequest

Request message for the [DeleteCustomClass][google.cloud.speech.v2.Speech.DeleteCustomClass] method.

DeletePhraseSetRequest

Request message for the [DeletePhraseSet][google.cloud.speech.v2.Speech.DeletePhraseSet] method.

DeleteRecognizerRequest

Request message for the [DeleteRecognizer][google.cloud.speech.v2.Speech.DeleteRecognizer] method.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

ExplicitDecodingConfig.Types

Container for nested types declared in the ExplicitDecodingConfig message type.

GcsOutputConfig

Output configurations for Cloud Storage.

GetConfigRequest

Request message for the [GetConfig][google.cloud.speech.v2.Speech.GetConfig] method.

GetCustomClassRequest

Request message for the [GetCustomClass][google.cloud.speech.v2.Speech.GetCustomClass] method.

GetPhraseSetRequest

Request message for the [GetPhraseSet][google.cloud.speech.v2.Speech.GetPhraseSet] method.

GetRecognizerRequest

Request message for the [GetRecognizer][google.cloud.speech.v2.Speech.GetRecognizer] method.

InlineOutputConfig

Output configurations for inline response.

InlineResult

Final results returned inline in the recognition response.

ListCustomClassesRequest

Request message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.

ListCustomClassesResponse

Response message for the [ListCustomClasses][google.cloud.speech.v2.Speech.ListCustomClasses] method.

ListPhraseSetsRequest

Request message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.

ListPhraseSetsResponse

Response message for the [ListPhraseSets][google.cloud.speech.v2.Speech.ListPhraseSets] method.

ListRecognizersRequest

Request message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.

ListRecognizersResponse

Response message for the [ListRecognizers][google.cloud.speech.v2.Speech.ListRecognizers] method.

NativeOutputFileFormatConfig

Output configurations for serialized BatchRecognizeResults protos.

OperationMetadata

Represents the metadata of a long-running operation.

OutputFormatConfig

Configuration for the format of the results stored to output.

PhraseSet

PhraseSet for biasing in speech recognition. A PhraseSet is used to provide "hints" to the speech recognizer to favor specific words and phrases in the results.

PhraseSet.Types

Container for nested types declared in the PhraseSet message type.

PhraseSet.Types.Phrase

A Phrase contains words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer.

List items can also include CustomClass references containing groups of words that represent common concepts that occur in natural language.

PhraseSetName

Resource name for the PhraseSet resource.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

RecognitionFeatures

Available recognition features.

RecognitionFeatures.Types

Container for nested types declared in the RecognitionFeatures message type.

RecognitionOutputConfig

Configuration options for the output(s) of recognition.

RecognitionResponseMetadata

Metadata about the recognition request and response.

RecognizeRequest

Request message for the [Recognize][google.cloud.speech.v2.Speech.Recognize] method. Either content or uri must be supplied. Supplying both or neither returns [INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT]. See content limits.

RecognizeResponse

Response message for the [Recognize][google.cloud.speech.v2.Speech.Recognize] method.

Recognizer

A Recognizer message. Stores recognition configuration and metadata.

Recognizer.Types

Container for nested types declared in the Recognizer message type.

RecognizerName

Resource name for the Recognizer resource.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

Speech

Enables speech transcription and resource management.

Speech.SpeechBase

Base class for server-side implementations of Speech

Speech.SpeechClient

Client for Speech

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

SpeechAdaptation.Types

Container for nested types declared in the SpeechAdaptation message type.

SpeechAdaptation.Types.AdaptationPhraseSet

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

SpeechClient

Speech client wrapper, for convenient use.

SpeechClient.StreamingRecognizeStream

Bidirectional streaming methods for StreamingRecognize(CallSettings, BidirectionalStreamingSettings).

SpeechClientBuilder

Builder class for SpeechClient to provide simple configuration of credentials, endpoint etc.

SpeechClientImpl

Speech client wrapper implementation, for convenient use.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

SpeechSettings

Settings for SpeechClient instances.

SrtOutputFileFormatConfig

Output configurations SubRip Text formatted subtitle file.

StreamingRecognitionConfig

Provides configuration information for the StreamingRecognize request.

StreamingRecognitionFeatures

Available recognition features specific to streaming recognition requests.

StreamingRecognitionFeatures.Types

Container for nested types declared in the StreamingRecognitionFeatures message type.

StreamingRecognitionFeatures.Types.VoiceActivityTimeout

Events that a timeout can be set on for voice activity.

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

StreamingRecognizeRequest

Request message for the [StreamingRecognize][google.cloud.speech.v2.Speech.StreamingRecognize] method. Multiple [StreamingRecognizeRequest][google.cloud.speech.v2.StreamingRecognizeRequest] messages are sent in one call.

If the [Recognizer][google.cloud.speech.v2.Recognizer] referenced by [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] contains a fully specified request configuration then the stream may only contain messages with only [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.

Otherwise the first message must contain a [recognizer][google.cloud.speech.v2.StreamingRecognizeRequest.recognizer] and a [streaming_config][google.cloud.speech.v2.StreamingRecognizeRequest.streaming_config] message that together fully specify the request configuration and must not contain [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio]. All subsequent messages must only have [audio][google.cloud.speech.v2.StreamingRecognizeRequest.audio] set.

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio then no messages are streamed back to the client.

Here are some examples of StreamingRecognizeResponses that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }

  2. results { alternatives { transcript: "to be a" } stability: 0.01 }

  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }

  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }

  5. results { alternatives { transcript: " that's" } stability: 0.01 }

  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }

  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }

Notes:

  • Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.

  • In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

StreamingRecognizeResponse.Types

Container for nested types declared in the StreamingRecognizeResponse message type.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

TranscriptNormalization.Types

Container for nested types declared in the TranscriptNormalization message type.

TranscriptNormalization.Types.Entry

A single replacement configuration.

TranslationConfig

Translation configuration. Use to translate the given audio into text for the desired language.

UndeleteCustomClassRequest

Request message for the [UndeleteCustomClass][google.cloud.speech.v2.Speech.UndeleteCustomClass] method.

UndeletePhraseSetRequest

Request message for the [UndeletePhraseSet][google.cloud.speech.v2.Speech.UndeletePhraseSet] method.

UndeleteRecognizerRequest

Request message for the [UndeleteRecognizer][google.cloud.speech.v2.Speech.UndeleteRecognizer] method.

UpdateConfigRequest

Request message for the [UpdateConfig][google.cloud.speech.v2.Speech.UpdateConfig] method.

UpdateCustomClassRequest

Request message for the [UpdateCustomClass][google.cloud.speech.v2.Speech.UpdateCustomClass] method.

UpdatePhraseSetRequest

Request message for the [UpdatePhraseSet][google.cloud.speech.v2.Speech.UpdatePhraseSet] method.

UpdateRecognizerRequest

Request message for the [UpdateRecognizer][google.cloud.speech.v2.Speech.UpdateRecognizer] method.

VttOutputFileFormatConfig

Output configurations for WebVTT formatted subtitle file.

WordInfo

Word-specific information for recognized words.

Enums

BatchRecognizeFileMetadata.AudioSourceOneofCase

Enum of possible cases for the "audio_source" oneof.

BatchRecognizeFileResult.ResultOneofCase

Enum of possible cases for the "result" oneof.

BatchRecognizeRequest.Types.ProcessingStrategy

Possible processing strategies for batch requests.

ConfigName.ResourceNameType

The possible contents of ConfigName.

CryptoKeyName.ResourceNameType

The possible contents of CryptoKeyName.

CryptoKeyVersionName.ResourceNameType

The possible contents of CryptoKeyVersionName.

CustomClass.Types.State

Set of states that define the lifecycle of a CustomClass.

CustomClassName.ResourceNameType

The possible contents of CustomClassName.

ExplicitDecodingConfig.Types.AudioEncoding

Supported audio data encodings.

OperationMetadata.MetadataOneofCase

Enum of possible cases for the "metadata" oneof.

OperationMetadata.RequestOneofCase

Enum of possible cases for the "request" oneof.

PhraseSet.Types.State

Set of states that define the lifecycle of a PhraseSet.

PhraseSetName.ResourceNameType

The possible contents of PhraseSetName.

RecognitionConfig.DecodingConfigOneofCase

Enum of possible cases for the "decoding_config" oneof.

RecognitionFeatures.Types.MultiChannelMode

Options for how to recognize multi-channel audio.

RecognitionOutputConfig.OutputOneofCase

Enum of possible cases for the "output" oneof.

RecognizeRequest.AudioSourceOneofCase

Enum of possible cases for the "audio_source" oneof.

Recognizer.Types.State

Set of states that define the lifecycle of a Recognizer.

RecognizerName.ResourceNameType

The possible contents of RecognizerName.

SpeechAdaptation.Types.AdaptationPhraseSet.ValueOneofCase

Enum of possible cases for the "value" oneof.

StreamingRecognizeRequest.StreamingRequestOneofCase

Enum of possible cases for the "streaming_request" oneof.

StreamingRecognizeResponse.Types.SpeechEventType

Indicates the type of speech event.