使用 SSML 读出地址

本教程演示如何使用语音合成标记语言 (SSML) 读出地址的文本文件。您可以使用 SSML 标记来标记文本字符串,以对 Text-to-Speech 合成音频进行个性化。

明文 明文的 SSML 渲染

123 Street Ln

<speak>123 Street Ln</speak>

1 Number St

<speak>1 Number St</speak>

1 Piazza del Fibonacci

<speak>1 Piazza del Fibonacci</speak>

目标

使用 SSML 和 Text-to-Speech 客户端库向 Text-Speech 发送合成语音请求。

费用

如需了解费用信息,请参阅 Text-to-Speech 价格页面

准备工作

下载代码示例

如需下载代码示例,请克隆要使用的编程语言的 Google Cloud GitHub 示例。

Java

本教程使用 Google Cloud Platform Java 示例代码库texttospeech/cloud-client/src/main/java/com/example/texttospeech/ 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
cd java-docs-samples/texttospeech/cloud-client/src/main/java/com/example/texttospeech/

Node.js

本教程使用 Google Cloud Platform Node.js 示例代码库texttospeech 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
cd texttospeech/

Python

本教程使用 Google Cloud Platform Python 示例代码库texttospeech/snippets 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
cd samples/snippets

安装客户端库

本教程使用 Text-to-Speech 客户端库

Java

本教程使用以下依赖项。

<!--  Using libraries-bom to manage versions.
See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.32.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-texttospeech</artifactId>
  </dependency>
</dependencies>

Node.js

从终端运行以下命令。

npm install @google-cloud/text-to-speech

Python

从终端运行以下命令。

pip install --upgrade google-cloud-texttospeech

设置您的 Google Cloud Platform 凭据

通过设置环境变量 GOOGLE_APPLICATION_CREDENTIALS 向应用代码提供身份验证凭据。此变量仅适用于当前的 Shell 会话。如果您希望变量应用于未来的 Shell 会话,请在 shell 启动文件中设置变量,例如在 ~/.bashrc~/.profile 文件中。

Linux 或 macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

KEY_PATH 替换为包含凭据的 JSON 文件的路径。

例如:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Windows

对于 PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

KEY_PATH 替换为包含凭据的 JSON 文件的路径。

例如:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

对于命令提示符:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

KEY_PATH 替换为包含凭据的 JSON 文件的路径。

导入库

本教程使用以下系统和客户端库。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud client library
import com.google.cloud.texttospeech.v1.AudioConfig;
import com.google.cloud.texttospeech.v1.AudioEncoding;
import com.google.cloud.texttospeech.v1.SsmlVoiceGender;
import com.google.cloud.texttospeech.v1.SynthesisInput;
import com.google.cloud.texttospeech.v1.SynthesizeSpeechResponse;
import com.google.cloud.texttospeech.v1.TextToSpeechClient;
import com.google.cloud.texttospeech.v1.VoiceSelectionParams;
import com.google.common.html.HtmlEscapers;
import com.google.protobuf.ByteString;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Paths;

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');

// Import other required libraries
const fs = require('fs');
//const escape = require('escape-html');
const util = require('util');

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import html

from google.cloud import texttospeech

使用 Text-to-Speech API

以下函数接受以 SSML 标记的文本字符串以及 MP3 文件的名称。函数使用以 SSML 标记的文本生成合成音频。函数将合成音频保存为指定为参数的 MP3 文件名。

只能通过单个语音读取整个 SSML 输入。您可以在 VoiceSelectionParams 对象中设置语音。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * <p>Given a string of SSML text and an output file name, this function calls the Text-to-Speech
 * API. The API returns a synthetic audio version of the text, formatted according to the SSML
 * commands. This function saves the synthetic audio to the designated output file.
 *
 * @param ssmlText String of tagged SSML text
 * @param outFile String name of file under which to save audio output
 * @throws Exception on errors while closing the client
 */
public static void ssmlToAudio(String ssmlText, String outFile) throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the ssml text input to synthesize
    SynthesisInput input = SynthesisInput.newBuilder().setSsml(ssmlText).build();

    // Build the voice request, select the language code ("en-US") and
    // the ssml voice gender ("male")
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US")
            .setSsmlGender(SsmlVoiceGender.MALE)
            .build();

    // Select the audio file type
    AudioConfig audioConfig =
        AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

    // Perform the text-to-speech request on the text input with the selected voice parameters and
    // audio file type
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file
    try (OutputStream out = new FileOutputStream(outFile)) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file " + outFile);
    }
  }
}

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * Given a string of SSML text and an output file name, this function
 * calls the Text-to-Speech API. The API returns a synthetic audio
 * version of the text, formatted according to the SSML commands. This
 * function saves the synthetic audio to the designated output file.
 *
 * ARGS
 * ssmlText: String of tagged SSML text
 * outfile: String name of file under which to save audio output
 * RETURNS
 * nothing
 *
 */
async function ssmlToAudio(ssmlText, outFile) {
  // Creates a client
  const client = new textToSpeech.TextToSpeechClient();

  // Constructs the request
  const request = {
    // Select the text to synthesize
    input: {ssml: ssmlText},
    // Select the language and SSML Voice Gender (optional)
    voice: {languageCode: 'en-US', ssmlGender: 'MALE'},
    // Select the type of audio encoding
    audioConfig: {audioEncoding: 'MP3'},
  };

  // Performs the Text-to-Speech request
  const [response] = await client.synthesizeSpeech(request);
  // Write the binary audio content to a local file
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outFile, response.audioContent, 'binary');
  console.log('Audio content written to file ' + outFile);
}

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def ssml_to_audio(ssml_text, outfile):
    # Generates SSML text from plaintext.
    #
    # Given a string of SSML text and an output file name, this function
    # calls the Text-to-Speech API. The API returns a synthetic audio
    # version of the text, formatted according to the SSML commands. This
    # function saves the synthetic audio to the designated output file.
    #
    # Args:
    # ssml_text: string of SSML text
    # outfile: string name of file under which to save audio output
    #
    # Returns:
    # nothing

    # Instantiates a client
    client = texttospeech.TextToSpeechClient()

    # Sets the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)

    # Builds the voice request, selects the language code ("en-US") and
    # the SSML voice gender ("MALE")
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )

    # Selects the type of audio file to return
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Performs the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open(outfile, "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + outfile)

对合成音频进行个性化设置

以下函数接受一个文本文件的名称并将文件的内容转换为以 SSML 标记的文本字符串。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates SSML text from plaintext.
 *
 * <p>Given an input filename, this function converts the contents of the input text file into a
 * String of tagged SSML text. This function formats the SSML String so that, when synthesized,
 * the synthetic audio will pause for two seconds between each line of the text file. This
 * function also handles special text characters which might interfere with SSML commands.
 *
 * @param inputFile String name of plaintext file
 * @return a String of SSML text based on plaintext input.
 * @throws IOException on files that don't exist
 */
public static String textToSsml(String inputFile) throws Exception {

  // Read lines of input file
  String rawLines = new String(Files.readAllBytes(Paths.get(inputFile)));

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  String escapedLines = HtmlEscapers.htmlEscaper().escape(rawLines);

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  String expandedNewline = escapedLines.replaceAll("\\n", "\n<break time='2s'/>");
  String ssml = "<speak>" + expandedNewline + "</speak>";

  // Return the concatenated String of SSML
  return ssml;
}

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates SSML text from plaintext.
 *
 * Given an input filename, this function converts the contents of the input text file
 * into a String of tagged SSML text. This function formats the SSML String so that,
 * when synthesized, the synthetic audio will pause for two seconds between each line
 * of the text file. This function also handles special text characters which might
 * interfere with SSML commands.
 *
 * ARGS
 * inputfile: String name of plaintext file
 * RETURNS
 * a String of SSML text based on plaintext input
 *
 */
function textToSsml(inputFile) {
  let rawLines = '';
  // Read input file
  try {
    rawLines = fs.readFileSync(inputFile, 'utf8');
  } catch (e) {
    console.log('Error:', e.stack);
    return;
  }

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  let escapedLines = rawLines;
  escapedLines = escapedLines.replace(/&/g, '&amp;');
  escapedLines = escapedLines.replace(/"/g, '&quot;');
  escapedLines = escapedLines.replace(/</g, '&lt;');
  escapedLines = escapedLines.replace(/>/g, '&gt;');

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  const expandedNewline = escapedLines.replace(/\n/g, '\n<break time="2s"/>');
  const ssml = '<speak>' + expandedNewline + '</speak>';

  // Return the concatenated String of SSML
  return ssml;
}

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def text_to_ssml(inputfile):
    # Generates SSML text from plaintext.
    # Given an input filename, this function converts the contents of the text
    # file into a string of formatted SSML text. This function formats the SSML
    # string so that, when synthesized, the synthetic audio will pause for two
    # seconds between each line of the text file. This function also handles
    # special text characters which might interfere with SSML commands.
    #
    # Args:
    # inputfile: string name of plaintext file
    #
    # Returns:
    # A string of SSML text based on plaintext input

    # Parses lines of input file
    with open(inputfile) as f:
        raw_lines = f.read()

    # Replace special characters with HTML Ampersand Character Codes
    # These Codes prevent the API from confusing text with
    # SSML commands
    # For example, '<' --> '&lt;' and '&' --> '&amp;'

    escaped_lines = html.escape(raw_lines)

    # Convert plaintext to SSML
    # Wait two seconds between each address
    ssml = "<speak>{}</speak>".format(
        escaped_lines.replace("\n", '\n<break time="2s"/>')
    )

    # Return the concatenated string of ssml script
    return ssml

总结

此程序使用以下输入。

123 Street Ln, Small Town, IL 12345 USA
1 Jenny St & Number St, Tutone City, CA 86753
1 Piazza del Fibonacci, 12358 Pisa, Italy

将上述文本传递给 text_to_ssml() 会生成以下标记文本。

<speak>123 Street Ln, Small Town, IL 12345 USA
<break time="2s"/>1 Jenny St &amp; Number St, Tutone City, CA 86753
<break time="2s"/>1 Piazza del Fibonacci, 12358 Pisa, Italy
<break time="2s"/></speak>

运行代码

要生成合成语音的音频文件,请从命令行运行以下代码。

Java

Linux 或 MacOS

java-docs-samples/texttospeech/cloud-client/ 目录中,在命令行上执行以下命令。

$ mvn clean package

Windows

java-docs-samples/texttospeech/cloud-client/ 目录中,在命令行上执行以下命令。

$ mvn clean package

Node.js

Linux 或 MacOS

hybridGlossaries.js 文件中,对被 TODO (developer) 注释掉的变量取消备注。

在以下命令中,将 projectId 替换为您的 Google Cloud 项目 ID。从 nodejs-docs-samples/texttospeech 目录中,在命令行上执行以下命令。

$ node ssmlAddresses.js projectId

Windows

hybridGlossaries.js 文件中,对被 TODO (developer) 注释掉的变量取消备注。

在以下命令中,将 projectId 替换为您的 Google Cloud 项目 ID。从 nodejs-docs-samples/texttospeech 目录中,在命令行上执行以下命令。

$env: C:/Node.js/node.exe C: ssmlAddresses.js projectId

Python

Linux 或 MacOS

python-docs-samples/texttospeech/snippets 目录中,在命令行上执行以下命令。

$ python ssml_addresses.py

Windows

python-docs-samples/texttospeech/snippets 目录中,在命令行上执行以下命令。

$env: C:/Python3/python.exe C: ssml_addresses.py

检查输出

此程序会输出合成语音的 example.mp3 音频文件。

Java

导航到 java-docs-samples/texttospeech/cloud-client/resources/ 目录。

resources 目录中查找 example.mp3 文件。

Node.js

导航到 nodejs-docs-samples/texttospeech/resources/ 目录。

resources 目录中查找 example.mp3 文件。

Python

导航到 python-docs-samples/texttospeech/snippets/resources

resources 目录中查找 example.mp3 文件。

聆听以下音频剪辑,以确认 example.mp3 文件的语音正是您所期望的语音。


问题排查

  • 忘记在命令行上设置 GOOGLE_APPLICATION_CREDENTIALS 环境变量会生成以下错误消息:

    The Application Default Credentials are not available.

  • text_to_ssml() 传递不存在的文件的名称会生成以下错误消息:

    IOError: [Errno 2] No such file or directory
    

  • ssml_to_audio() 传递包含 Nonessml_text 参数会生成以下错误消息:

    InvalidArgument: 400 Invalid input type. Type has to be text or SSML
    

  • 确保您运行的是来自正确目录的代码。

后续步骤

清理

为避免您的 Google Cloud Platform 账号因本教程中使用的资源而产生费用,如果您不需要该项目,请使用 Google Cloud 控制台删除该项目。

删除项目

  1. Google Cloud 控制台中,转到“项目”页面。
  2. 在项目列表中,选择要删除的项目,然后点击删除
  3. 在对话框中输入项目 ID,然后点击关停以删除项目。