Media Translation API is deprecated and will no longer be available on Google Cloud after July 1, 2024. You can replicate the functionality of Media Translation API through a combination of other Google Cloud services such as Cloud Speech-to-Text and Cloud Translation API.

Package google.cloud.mediatranslation.v1beta1

Index

SpeechTranslationService (interface)
StreamingTranslateSpeechConfig (message)
StreamingTranslateSpeechRequest (message)
StreamingTranslateSpeechResponse (message)
StreamingTranslateSpeechResponse.SpeechEventType (enum)
StreamingTranslateSpeechResult (message)
StreamingTranslateSpeechResult.TextTranslationResult (message)
TranslateSpeechConfig (message)

SpeechTranslationService

Provides translation from/to media types.

StreamingTranslateSpeech

StreamingTranslateSpeech
`rpc StreamingTranslateSpeech(StreamingTranslateSpeechRequest) returns (StreamingTranslateSpeechResponse)` Performs bidirectional streaming speech translation: receive results while sending audio. This method is only available via the gRPC API (not REST). Authorization Scopes Requires the following OAuth scope: `https://www.googleapis.com/auth/cloud-platform` For more information, see the Authentication Overview.

rpc StreamingTranslateSpeech(StreamingTranslateSpeechRequest) returns (StreamingTranslateSpeechResponse)

Performs bidirectional streaming speech translation: receive results while sending audio. This method is only available via the gRPC API (not REST).

Authorization Scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

StreamingTranslateSpeechConfig

Config used for streaming translation.

Fields

Fields
`audio_config`	`TranslateSpeechConfig` Required. The common config for all the following audio contents.
`single_utterance`	`bool` Optional. If `false` or omitted, the system performs continuous translation (continuing to wait for and process audio even if the user pauses speaking) until the client closes the input stream (gRPC API) or until the maximum time limit has been reached. May return multiple `StreamingTranslateSpeechResult`s with the `is_final` flag set to `true`. If `true`, the speech translator will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an `END_OF_SINGLE_UTTERANCE` event and cease translation. When the client receives 'END_OF_SINGLE_UTTERANCE' event, the client should stop sending the requests. However, clients should keep receiving remaining responses until the stream is terminated. To construct the complete sentence in a streaming way, one should override (if 'is_final' of previous response is false), or append (if 'is_final' of previous response is true).
`stability`	`string` Optional. Stability control for the media translation text. Note that stability and speed would be trade off. The value should be "LOW", "MEDIUM", "HIGH", default empty string will be treated as "LOW". (1) "LOW": In low mode, translation service will start to do translation right after getting recognition response. The speed will be faster. (2) "MEDIUM": In medium mode, translation service will check if the recognition response is stable enough or not, and only translate recognition response which is not likely to be changed later. (3) "HIGH": In high mode, translation service will wait for more stable recognition responses, and then start to do translation. Also, the following recognition responses cannot modify previous recognition responses. Thus it may impact quality in some situation. "HIGH" stability will generate "final" responses more frequently.

audio_config

TranslateSpeechConfig

Required. The common config for all the following audio contents.

single_utterance

bool

Optional. If false or omitted, the system performs continuous translation (continuing to wait for and process audio even if the user pauses speaking) until the client closes the input stream (gRPC API) or until the maximum time limit has been reached. May return multiple StreamingTranslateSpeechResults with the is_final flag set to true.

If true, the speech translator will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an END_OF_SINGLE_UTTERANCE event and cease translation. When the client receives 'END_OF_SINGLE_UTTERANCE' event, the client should stop sending the requests. However, clients should keep receiving remaining responses until the stream is terminated. To construct the complete sentence in a streaming way, one should override (if 'is_final' of previous response is false), or append (if 'is_final' of previous response is true).

stability

string

Optional. Stability control for the media translation text. Note that stability and speed would be trade off. The value should be "LOW", "MEDIUM", "HIGH", default empty string will be treated as "LOW". (1) "LOW": In low mode, translation service will start to do translation right after getting recognition response. The speed will be faster. (2) "MEDIUM": In medium mode, translation service will check if the recognition response is stable enough or not, and only translate recognition response which is not likely to be changed later. (3) "HIGH": In high mode, translation service will wait for more stable recognition responses, and then start to do translation. Also, the following recognition responses cannot modify previous recognition responses. Thus it may impact quality in some situation. "HIGH" stability will generate "final" responses more frequently.

StreamingTranslateSpeechRequest

The top-level message sent by the client for the StreamingTranslateSpeech method. Multiple StreamingTranslateSpeechRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content data. All subsequent messages must contain audio_content data and must not contain a streaming_config message.

Fields

Fields
Union field `streaming_request`. The streaming request, which is either a streaming config or content. `streaming_request` can be only one of the following:
`streaming_config`	`StreamingTranslateSpeechConfig` Provides information to the recognizer that specifies how to process the request. The first `StreamingTranslateSpeechRequest` message must contain a `streaming_config` message.
`audio_content`	`bytes` The audio data to be translated. Sequential chunks of audio data are sent in sequential `StreamingTranslateSpeechRequest` messages. The first `StreamingTranslateSpeechRequest` message must not contain `audio_content` data and all subsequent `StreamingTranslateSpeechRequest` messages must contain `audio_content` data. The audio bytes must be encoded as specified in `StreamingTranslateSpeechConfig`. Note: as with all bytes fields, protobuffers use a pure binary representation (not base64).

Union field streaming_request. The streaming request, which is either a streaming config or content. streaming_request can be only one of the following:

streaming_config

StreamingTranslateSpeechConfig

Provides information to the recognizer that specifies how to process the request. The first StreamingTranslateSpeechRequest message must contain a streaming_config message.

audio_content

bytes

The audio data to be translated. Sequential chunks of audio data are sent in sequential StreamingTranslateSpeechRequest messages. The first StreamingTranslateSpeechRequest message must not contain audio_content data and all subsequent StreamingTranslateSpeechRequest messages must contain audio_content data. The audio bytes must be encoded as specified in StreamingTranslateSpeechConfig. Note: as with all bytes fields, protobuffers use a pure binary representation (not base64).

StreamingTranslateSpeechResponse

A streaming speech translation response corresponding to a portion of the audio currently processed.

Fields

Fields
`error`	`Status` Output only. If set, returns a `google.rpc.Status` message that specifies the error for the operation.
`result`	`StreamingTranslateSpeechResult` Output only. The translation result that is currently being processed (is_final could be true or false).
`speech_event_type`	`SpeechEventType` Output only. Indicates the type of speech event.

error

Status

Output only. If set, returns a google.rpc.Status message that specifies the error for the operation.

result

StreamingTranslateSpeechResult

Output only. The translation result that is currently being processed (is_final could be true or false).

speech_event_type

SpeechEventType

Output only. Indicates the type of speech event.

SpeechEventType

Indicates the type of speech event.

Enums

SPEECH_EVENT_TYPE_UNSPECIFIED No speech event specified.

END_OF_SINGLE_UTTERANCE This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio (although it may subsequently return additional results). When the client receives 'END_OF_SINGLE_UTTERANCE' event, the client should stop sending the requests. However, clients should keep receiving remaining responses until the stream is terminated. To construct the complete sentence in a streaming way, one should override (if 'is_final' of previous response is false), or append (if 'is_final' of previous response is true). This event is only sent if single_utterance was set to true, and is not used otherwise.

Enums
`SPEECH_EVENT_TYPE_UNSPECIFIED`	No speech event specified.
`END_OF_SINGLE_UTTERANCE`	This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio (although it may subsequently return additional results). When the client receives 'END_OF_SINGLE_UTTERANCE' event, the client should stop sending the requests. However, clients should keep receiving remaining responses until the stream is terminated. To construct the complete sentence in a streaming way, one should override (if 'is_final' of previous response is false), or append (if 'is_final' of previous response is true). This event is only sent if `single_utterance` was set to `true`, and is not used otherwise.

StreamingTranslateSpeechResult

A streaming speech translation result corresponding to a portion of the audio that is currently being processed.

Fields

Fields
`recognition_result`	`string` Output only. The debug only recognition result in original language. This field is debug only and will be set to empty string if not available. This is implementation detail and will not be backward compatible.
`text_translation_result`	`TextTranslationResult` Text translation result.

recognition_result

string

Output only. The debug only recognition result in original language. This field is debug only and will be set to empty string if not available. This is implementation detail and will not be backward compatible.

text_translation_result

TextTranslationResult

Text translation result.

TextTranslationResult

Text translation result.

Fields

Fields
`translation`	`string` Output only. The translated sentence.
`is_final`	`bool` Output only. If `false`, this `StreamingTranslateSpeechResult` represents an interim result that may change. If `true`, this is the final time the translation service will return this particular `StreamingTranslateSpeechResult`, the streaming translator will not return any further hypotheses for this portion of the transcript and corresponding audio.

translation

string

Output only. The translated sentence.

is_final

bool

Output only. If false, this StreamingTranslateSpeechResult represents an interim result that may change. If true, this is the final time the translation service will return this particular StreamingTranslateSpeechResult, the streaming translator will not return any further hypotheses for this portion of the transcript and corresponding audio.

TranslateSpeechConfig

Provides information to the speech translation that specifies how to process the request.

Fields
`audio_encoding`	`string` Required. Encoding of audio data. Supported formats: `linear16` Uncompressed 16-bit signed little-endian samples (Linear PCM). `flac` `flac` (Free Lossless Audio Codec) is the recommended encoding because it is lossless--therefore recognition is not compromised--and requires only about half the bandwidth of `linear16`. `mulaw` 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. `amr` Adaptive Multi-Rate Narrowband codec. `sample_rate_hertz` must be 8000. `amr-wb` Adaptive Multi-Rate Wideband codec. `sample_rate_hertz` must be 16000. `ogg-opus` Opus encoded audio frames in Ogg container. `sample_rate_hertz` must be one of 8000, 12000, 16000, 24000, or 48000. `mp3` MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, `sample_rate_hertz` has to match the sample rate of the file being used.
`source_language_code`	`string` Required. Source language code (BCP-47) of the input audio.
`target_language_code`	`string` Required. Target language code (BCP-47) of the output.
`sample_rate_hertz`	`int32` Optional. Sample rate in Hertz of the audio data. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).
`model`	`string` Optional. `google-provided-model/video` and `google-provided-model/enhanced-phone-call` are premium models. `google-provided-model/phone-call` is not premium model.