Index
SpeechTranslationService
(interface)StreamingTranslateSpeechConfig
(message)StreamingTranslateSpeechRequest
(message)StreamingTranslateSpeechResponse
(message)StreamingTranslateSpeechResponse.SpeechEventType
(enum)StreamingTranslateSpeechResult
(message)StreamingTranslateSpeechResult.TextTranslationResult
(message)TranslateSpeechConfig
(message)
SpeechTranslationService
Provides translation from/to media types.
StreamingTranslateSpeech |
---|
Performs bidirectional streaming speech translation: receive results while sending audio. This method is only available via the gRPC API (not REST).
|
StreamingTranslateSpeechConfig
Config used for streaming translation.
Fields | |
---|---|
audio_config |
Required. The common config for all the following audio contents. |
single_utterance |
Optional. If If |
stability |
Optional. Stability control for the media translation text. Note that stability and speed would be trade off. The value should be "LOW", "MEDIUM", "HIGH", default empty string will be treated as "LOW". (1) "LOW": In low mode, translation service will start to do translation right after getting recognition response. The speed will be faster. (2) "MEDIUM": In medium mode, translation service will check if the recognition response is stable enough or not, and only translate recognition response which is not likely to be changed later. (3) "HIGH": In high mode, translation service will wait for more stable recognition responses, and then start to do translation. Also, the following recognition responses cannot modify previous recognition responses. Thus it may impact quality in some situation. "HIGH" stability will generate "final" responses more frequently. |
StreamingTranslateSpeechRequest
The top-level message sent by the client for the StreamingTranslateSpeech
method. Multiple StreamingTranslateSpeechRequest
messages are sent. The first message must contain a streaming_config
message and must not contain audio_content
data. All subsequent messages must contain audio_content
data and must not contain a streaming_config
message.
Fields | |
---|---|
Union field streaming_request . The streaming request, which is either a streaming config or content. streaming_request can be only one of the following: |
|
streaming_config |
Provides information to the recognizer that specifies how to process the request. The first |
audio_content |
The audio data to be translated. Sequential chunks of audio data are sent in sequential |
StreamingTranslateSpeechResponse
A streaming speech translation response corresponding to a portion of the audio currently processed.
Fields | |
---|---|
error |
Output only. If set, returns a |
result |
Output only. The translation result that is currently being processed (is_final could be true or false). |
speech_event_type |
Output only. Indicates the type of speech event. |
SpeechEventType
Indicates the type of speech event.
Enums | |
---|---|
SPEECH_EVENT_TYPE_UNSPECIFIED |
No speech event specified. |
END_OF_SINGLE_UTTERANCE |
This event indicates that the server has detected the end of the user's speech utterance and expects no additional speech. Therefore, the server will not process additional audio (although it may subsequently return additional results). When the client receives 'END_OF_SINGLE_UTTERANCE' event, the client should stop sending the requests. However, clients should keep receiving remaining responses until the stream is terminated. To construct the complete sentence in a streaming way, one should override (if 'is_final' of previous response is false), or append (if 'is_final' of previous response is true). This event is only sent if single_utterance was set to true , and is not used otherwise. |
StreamingTranslateSpeechResult
A streaming speech translation result corresponding to a portion of the audio that is currently being processed.
Fields | |
---|---|
recognition_result |
Output only. The debug only recognition result in original language. This field is debug only and will be set to empty string if not available. This is implementation detail and will not be backward compatible. |
text_translation_result |
Text translation result. |
TextTranslationResult
Text translation result.
Fields | |
---|---|
translation |
Output only. The translated sentence. |
is_final |
Output only. If |
TranslateSpeechConfig
Provides information to the speech translation that specifies how to process the request.
Fields | |
---|---|
audio_encoding |
Required. Encoding of audio data. Supported formats:
Uncompressed 16-bit signed little-endian samples (Linear PCM).
8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
Adaptive Multi-Rate Narrowband codec.
Adaptive Multi-Rate Wideband codec.
Opus encoded audio frames in Ogg container.
MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, |
source_language_code |
Required. Source language code (BCP-47) of the input audio. |
target_language_code |
Required. Target language code (BCP-47) of the output. |
sample_rate_hertz |
Optional. Sample rate in Hertz of the audio data. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling). |
model |
Optional. |