Synthesize speech with bidirectional streaming
This document walks you through the process of synthesizing audio using bidirectional streaming.
Bidirectional streaming lets you send text input and receive audio data simultaneously. This means that you can start synthesizing speech before the complete input text is sent, which reduces latency and enables real-time interactions. Voice assistants and interactive games use bidirectional streaming to create more dynamic and responsive applications.
To learn more about the fundamental concepts in Text-to-Speech, read Text-to-Speech Basics.
Before you begin
Before you can send a request to the Text-to-Speech API, you must have completed the following actions. See the before you begin page for details.
- Enable Text-to-Speech on a Google Cloud project.
- Make sure billing is enabled for Text-to-Speech.
-
After installing the Google Cloud CLI, configure the gcloud CLI to use your federated identity and then initialize it by running the following command:
gcloud init
Synthesize speech with bidirectional streaming
Install the client library
Python
Before installing the library, make sure you've prepared your environment for Python development.
pip install --upgrade google-cloud-texttospeech
Send a stream of text and receive a stream of audio
The API accepts a stream of requests with type StreamingSynthesizeRequest
,
which contain either StreamingSynthesisInput
or StreamingSynthesizeConfig
.
Before sending a stream StreamingSynthesizeRequest
with
StreamingSynthesisInput
, which provides text input, send exactly one
StreamingSynthesizeRequest
with a StreamingSynthesizeConfig
.
Streaming Text-to-Speech is only compatible with Journey voices.
Python
Before running the example, make sure you've prepared your environment for Python development.
Clean up
To avoid unnecessary Google Cloud Platform charges, use the Google Cloud console to delete your project if you do not need it.
What's next
- Learn more about Cloud Text-to-Speech by reading the basics.
- Review the list of available voices you can use for synthetic speech.