Gemini-TTS

Text-to-Speech Gemini-TTS 是我们 Text-to-Speech 技术的最新发展，它不仅能生成自然流畅的语音，还能通过基于文本的提示对生成的音频进行精细控制。借助 Gemini-TTS，您可以将短片段合成语音，也可以将长篇叙事内容合成语音，并能通过自然语言提示精确控制风格、口音、语速、语气，甚至情感表达。

以下产品支持 Gemini-TTS 功能：

gemini-2.5-flash-preview-tts：Gemini 2.5 Flash 预览版非常适合经济实惠的日常应用。
gemini-2.5-pro-preview-tts：Gemini 2.5 Pro 预览版非常适合生成可控的语音 (TTS)，并且能够以出色的质量处理复杂的提示。

型号	优化目标	输入模态	输出模态	一位说话者
Gemini 2.5 Flash 预览版 TTS	低延迟、可控的单扬声器和多扬声器文字转语音音频生成，适用于经济实惠的日常应用	文本	音频	✔️
Gemini 2.5 Pro 预览版 TTS	高度控制，适用于播客生成、有声读物、客户支持等结构化工作流程	文本	音频	✔️

其他控制措施和功能包括：

自然对话：语音互动质量出色，表达方式和韵律（节奏模式）更恰当，延迟非常低，因此您可以流畅地对话。
风格控制：使用自然语言提示，您可以引导对话采用特定口音，并生成各种语气和表达方式（包括耳语），从而调整对话中的表达方式。
动态表演：这些模型可以生动地朗读诗歌、新闻报道和精彩的故事，让文本活灵活现。它们还可以根据要求以特定情绪表演，并发出特定口音。
增强的语速和发音控制功能：控制朗读速度有助于确保发音（包括特定字词）更加准确。

示例

model: "gemini-2.5-pro-preview-tts"
prompt: "You are having a casual conversation with a friend. Say the following in a friendly and amused way."
text: "hahah I did NOT expect that. Can you believe it!."
speaker: "Callirhoe"

model: "gemini-2.5-flash-preview-tts"
prompt: "Say the following in a curious way"
text: "OK, so... tell me about this [uhm] AI thing.",
speaker: "Orus"

model: "gemini-2.5-flash-preview-tts"
prompt: "Say the following"
text: "[extremely fast] Availability and terms may vary. Check our website or your local store for complete details and restrictions."
speaker: "Kore"

如需详细了解如何以编程方式使用这些声音，请参阅使用 Gemini-TTS 部分。

语音选项

Gemini-TTS 提供多种语音选项，与我们现有的 Chirp 3：高清语音类似，每种选项都有不同的特点：

名称	性别	演示
Achernar	女
Achird	男
Algenib	男
Algieba	男
Alnilam	男
Aoede	女
Autonoe	女
Callirrhoe	女
冥卫一	男
Despina	女
土卫二	男
Erinome	女
Fenrir	男
Gacrux	女
土卫八	男
Kore	女
Laomedeia	女
Leda	女
Orus	男
Pulcherrima	女
Puck	男
Rasalgethi	男
Sadachbia	男
Sadaltager	男
Schedar	男
Sulafat	女
Umbriel	男
Vindemiatrix	女
Zephyr	女
Zubenelgenubi	男

支持的语言

Gemini-TTS 提供多种语音选项，与我们现有的 Chirp 3：高清语音类似，每种选项都有不同的特点：

语言	BCP-47 代码
英语（美国）	en-US

区域可用性

Gemini-TTS 模型已在以下 Google Cloud 区域推出：

Google Cloud 可用区	发布就绪情况
`global`	公开预览版

支持的输出格式

默认响应格式为 LINEAR16。其他支持的格式包括：

API 方法	格式
`batch`	ALAW、MULAW、MP3、OGG_OPUS 和 PCM

使用 Gemini-TTS

了解如何使用 Gemini-TTS 模型合成单人语音。

执行同步语音合成请求

Python

# google-cloud-texttospeech minimum version 2.29.0 is required.

import os
from google.cloud import texttospeech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def synthesize(prompt: str, text: str, model_name: str, output_filepath: str = "output.mp3"):
   """Synthesizes speech from the input text and saves it to an MP3 file.

   Args:
       prompt: Stylisting instructions on how to synthesize the content in
         the text field.
       text: The text to synthesize.
       model_name: Gemini model to use. Currently, the available models are
         gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts
       output_filepath: The path to save the generated audio file.
         Defaults to "output.mp3".
   """
   client = texttospeech.TextToSpeechClient()

   synthesis_input = texttospeech.SynthesisInput(text=text, prompt=prompt)

   # Select the voice you want to use.
   voice = texttospeech.VoiceSelectionParams(
       language_code="en-US",
       name="Charon",  # Example voice, adjust as needed
       model_name=model_name
   )

   audio_config = texttospeech.AudioConfig(
       audio_encoding=texttospeech.AudioEncoding.MP3
   )

   # Perform the text-to-speech request on the text input with the selected
   # voice parameters and audio file type.
   response = client.synthesize_speech(
       input=synthesis_input, voice=voice, audio_config=audio_config
   )

   # The response's audio_content is binary.
   with open(output_filepath, "wb") as out:
       out.write(response.audio_content)
       print(f"Audio content written to file: {output_filepath}")

CURL

# Make sure to install gcloud cli, and sign in to your project.
# Make sure to use your PROJECT_ID value.
# Currently, the available models are gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts
# To parse the JSON output and use it directly see the last line of the command.
# Requires JQ and ffplay library to be installed.
PROJECT_ID=YOUR_PROJECT_ID
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "x-goog-user-project: $PROJECT_ID" \
-H "Content-Type: application/json" \
-d '{
"input": {
  "prompt": "Say the following in a curious way",
  "text": "OK, so... tell me about this [uhm] AI thing."
},
"voice": {
  "languageCode": "en-us",
  "name": "Kore",
  "model_name": "gemini-2.5-flash-preview-tts"
},
"audioConfig": {
  "audioEncoding": "LINEAR16"
}
}' \
"https://texttospeech.googleapis.com/v1/text:synthesize" \
| jq -r '.audioContent' | base64 -d | ffplay - -autoexit