使用 SSML 读出地址

本教程演示如何使用语音合成标记语言 (SSML) 读出地址的文本文件。您可以使用 SSML 标记来标记文本字符串,以对 Text-to-Speech 合成音频进行个性化。

明文 明文的 SSML 渲染
123 Street Ln
<speak>123 Street Ln</speak>
1 Number St
<speak>1 Number St</speak>
1 Piazza del Fibonacci
<speak>1 Piazza del Fibonacci</speak>

目标

使用 SSML 和 Text-to-Speech 客户端库向 Text-Speech 发送合成语音请求。

费用

如需了解费用信息,请参阅 Text-to-Speech 价格页面

准备工作

下载代码示例

如需下载代码示例,请克隆要使用的编程语言的 Google Cloud GitHub 示例。

Java

本教程使用 Google Cloud Platform Java 示例代码库texttospeech/cloud-client/src/main/java/com/example/texttospeech/ 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
cd java-docs-samples/texttospeech/cloud-client/src/main/java/com/example/texttospeech/

Node.js

本教程使用 Google Cloud Platform Node.js 示例代码库texttospeech 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
cd texttospeech/

Python

本教程使用 Google Cloud Platform Python 示例代码库texttospeech/snippets 目录中的代码。

如需下载并导航到本教程的代码,请从终端运行以下命令。

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
cd samples/snippets

安装客户端库

本教程使用 Text-to-Speech 客户端库

Java

本教程使用以下依赖项。

<!--  Using libraries-bom to manage versions.
See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.32.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-texttospeech</artifactId>
  </dependency>
</dependencies>

Node.js

从终端运行以下命令。

npm install @google-cloud/text-to-speech

Python

从终端运行以下命令。

pip install --upgrade google-cloud-texttospeech

设置您的 Google Cloud Platform 凭据

Provide authentication credentials to your application code by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS. This variable applies only to your current shell session. If you want the variable to apply to future shell sessions, set the variable in your shell startup file, for example in the ~/.bashrc or ~/.profile file.

Linux 或 macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Replace KEY_PATH with the path of the JSON file that contains your credentials.

For example:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Windows

For PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Replace KEY_PATH with the path of the JSON file that contains your credentials.

For example:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

For command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

Replace KEY_PATH with the path of the JSON file that contains your credentials.

导入库

本教程使用以下系统和客户端库。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud client library
import com.google.cloud.texttospeech.v1.AudioConfig;
import com.google.cloud.texttospeech.v1.AudioEncoding;
import com.google.cloud.texttospeech.v1.SsmlVoiceGender;
import com.google.cloud.texttospeech.v1.SynthesisInput;
import com.google.cloud.texttospeech.v1.SynthesizeSpeechResponse;
import com.google.cloud.texttospeech.v1.TextToSpeechClient;
import com.google.cloud.texttospeech.v1.VoiceSelectionParams;
import com.google.common.html.HtmlEscapers;
import com.google.protobuf.ByteString;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Paths;

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');

// Import other required libraries
const fs = require('fs');
//const escape = require('escape-html');
const util = require('util');

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import html

from google.cloud import texttospeech

使用 Text-to-Speech API

以下函数接受以 SSML 标记的文本字符串以及 MP3 文件的名称。函数使用以 SSML 标记的文本生成合成音频。函数将合成音频保存为指定为参数的 MP3 文件名。

只能通过单个语音读取整个 SSML 输入。您可以在 VoiceSelectionParams 对象中设置语音。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * <p>Given a string of SSML text and an output file name, this function calls the Text-to-Speech
 * API. The API returns a synthetic audio version of the text, formatted according to the SSML
 * commands. This function saves the synthetic audio to the designated output file.
 *
 * @param ssmlText String of tagged SSML text
 * @param outFile String name of file under which to save audio output
 * @throws Exception on errors while closing the client
 */
public static void ssmlToAudio(String ssmlText, String outFile) throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the ssml text input to synthesize
    SynthesisInput input = SynthesisInput.newBuilder().setSsml(ssmlText).build();

    // Build the voice request, select the language code ("en-US") and
    // the ssml voice gender ("male")
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US")
            .setSsmlGender(SsmlVoiceGender.MALE)
            .build();

    // Select the audio file type
    AudioConfig audioConfig =
        AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

    // Perform the text-to-speech request on the text input with the selected voice parameters and
    // audio file type
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file
    try (OutputStream out = new FileOutputStream(outFile)) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file " + outFile);
    }
  }
}

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates synthetic audio from a String of SSML text.
 *
 * Given a string of SSML text and an output file name, this function
 * calls the Text-to-Speech API. The API returns a synthetic audio
 * version of the text, formatted according to the SSML commands. This
 * function saves the synthetic audio to the designated output file.
 *
 * ARGS
 * ssmlText: String of tagged SSML text
 * outfile: String name of file under which to save audio output
 * RETURNS
 * nothing
 *
 */
async function ssmlToAudio(ssmlText, outFile) {
  // Creates a client
  const client = new textToSpeech.TextToSpeechClient();

  // Constructs the request
  const request = {
    // Select the text to synthesize
    input: {ssml: ssmlText},
    // Select the language and SSML Voice Gender (optional)
    voice: {languageCode: 'en-US', ssmlGender: 'MALE'},
    // Select the type of audio encoding
    audioConfig: {audioEncoding: 'MP3'},
  };

  // Performs the Text-to-Speech request
  const [response] = await client.synthesizeSpeech(request);
  // Write the binary audio content to a local file
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outFile, response.audioContent, 'binary');
  console.log('Audio content written to file ' + outFile);
}

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def ssml_to_audio(ssml_text: str) -> None:
    """
    Generates SSML text from plaintext.
    Given a string of SSML text and an output file name, this function
    calls the Text-to-Speech API. The API returns a synthetic audio
    version of the text, formatted according to the SSML commands. This
    function saves the synthetic audio to the designated output file.

    Args:
        ssml_text: string of SSML text
    """

    # Instantiates a client
    client = texttospeech.TextToSpeechClient()

    # Sets the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(ssml=ssml_text)

    # Builds the voice request, selects the language code ("en-US") and
    # the SSML voice gender ("MALE")
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
    )

    # Selects the type of audio file to return
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    # Performs the text-to-speech request on the text input with the selected
    # voice parameters and audio file type
    response = client.synthesize_speech(
        input=synthesis_input, voice=voice, audio_config=audio_config
    )

    # Writes the synthetic audio to the output file.
    with open("test_example.mp3", "wb") as out:
        out.write(response.audio_content)
        print("Audio content written to file " + "test_example.mp3")

对合成音频进行个性化设置

以下函数接受一个文本文件的名称并将文件的内容转换为以 SSML 标记的文本字符串。

Java

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Java API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates SSML text from plaintext.
 *
 * <p>Given an input filename, this function converts the contents of the input text file into a
 * String of tagged SSML text. This function formats the SSML String so that, when synthesized,
 * the synthetic audio will pause for two seconds between each line of the text file. This
 * function also handles special text characters which might interfere with SSML commands.
 *
 * @param inputFile String name of plaintext file
 * @return a String of SSML text based on plaintext input.
 * @throws IOException on files that don't exist
 */
public static String textToSsml(String inputFile) throws Exception {

  // Read lines of input file
  String rawLines = new String(Files.readAllBytes(Paths.get(inputFile)));

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  String escapedLines = HtmlEscapers.htmlEscaper().escape(rawLines);

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  String expandedNewline = escapedLines.replaceAll("\\n", "\n<break time='2s'/>");
  String ssml = "<speak>" + expandedNewline + "</speak>";

  // Return the concatenated String of SSML
  return ssml;
}

Node.js

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Node.js API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

/**
 * Generates SSML text from plaintext.
 *
 * Given an input filename, this function converts the contents of the input text file
 * into a String of tagged SSML text. This function formats the SSML String so that,
 * when synthesized, the synthetic audio will pause for two seconds between each line
 * of the text file. This function also handles special text characters which might
 * interfere with SSML commands.
 *
 * ARGS
 * inputfile: String name of plaintext file
 * RETURNS
 * a String of SSML text based on plaintext input
 *
 */
function textToSsml(inputFile) {
  let rawLines = '';
  // Read input file
  try {
    rawLines = fs.readFileSync(inputFile, 'utf8');
  } catch (e) {
    console.log('Error:', e.stack);
    return;
  }

  // Replace special characters with HTML Ampersand Character Codes
  // These codes prevent the API from confusing text with SSML tags
  // For example, '<' --> '&lt;' and '&' --> '&amp;'
  let escapedLines = rawLines;
  escapedLines = escapedLines.replace(/&/g, '&amp;');
  escapedLines = escapedLines.replace(/"/g, '&quot;');
  escapedLines = escapedLines.replace(/</g, '&lt;');
  escapedLines = escapedLines.replace(/>/g, '&gt;');

  // Convert plaintext to SSML
  // Tag SSML so that there is a 2 second pause between each address
  const expandedNewline = escapedLines.replace(/\n/g, '\n<break time="2s"/>');
  const ssml = '<speak>' + expandedNewline + '</speak>';

  // Return the concatenated String of SSML
  return ssml;
}

Python

如需了解如何安装和使用 Text-to-Speech 客户端库,请参阅 Text-to-Speech 客户端库。 如需了解详情,请参阅 Text-to-Speech Python API 参考文档

如需向 Text-to-Speech 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def text_to_ssml(inputfile: str) -> str:
    """
    Generates SSML text from plaintext.
    Given an input filename, this function converts the contents of the text
    file into a string of formatted SSML text. This function formats the SSML
    string so that, when synthesized, the synthetic audio will pause for two
    seconds between each line of the text file. This function also handles
    special text characters which might interfere with SSML commands.

    Args:
        inputfile: name of plaintext file
    Returns: SSML text based on plaintext input
    """

    # Parses lines of input file
    with open(inputfile) as f:
        raw_lines = f.read()

    # Replace special characters with HTML Ampersand Character Codes
    # These Codes prevent the API from confusing text with
    # SSML commands
    # For example, '<' --> '&lt;' and '&' --> '&amp;'

    escaped_lines = html.escape(raw_lines)

    # Convert plaintext to SSML
    # Wait two seconds between each address
    ssml = "<speak>{}</speak>".format(
        escaped_lines.replace("\n", '\n<break time="2s"/>')
    )

    # Return the concatenated string of ssml script
    return ssml

总结

此程序使用以下输入。

123 Street Ln, Small Town, IL 12345 USA
1 Jenny St & Number St, Tutone City, CA 86753
1 Piazza del Fibonacci, 12358 Pisa, Italy

将上述文本传递给 text_to_ssml() 会生成以下标记文本。

<speak>123 Street Ln, Small Town, IL 12345 USA
<break time="2s"/>1 Jenny St &amp; Number St, Tutone City, CA 86753
<break time="2s"/>1 Piazza del Fibonacci, 12358 Pisa, Italy
<break time="2s"/></speak>

运行代码

要生成合成语音的音频文件,请从命令行运行以下代码。

Java

Linux 或 MacOS

java-docs-samples/texttospeech/cloud-client/ 目录中,在命令行上执行以下命令。

$ mvn clean package

Windows

java-docs-samples/texttospeech/cloud-client/ 目录中,在命令行上执行以下命令。

$ mvn clean package

Node.js

Linux 或 MacOS

hybridGlossaries.js 文件中,对被 TODO (developer) 注释掉的变量取消备注。

在以下命令中,将 projectId 替换为您的 Google Cloud 项目 ID。从 nodejs-docs-samples/texttospeech 目录中,在命令行上执行以下命令。

$ node ssmlAddresses.js projectId

Windows

hybridGlossaries.js 文件中,对被 TODO (developer) 注释掉的变量取消备注。

在以下命令中,将 projectId 替换为您的 Google Cloud 项目 ID。从 nodejs-docs-samples/texttospeech 目录中,在命令行上执行以下命令。

$env: C:/Node.js/node.exe C: ssmlAddresses.js projectId

Python

Linux 或 MacOS

python-docs-samples/texttospeech/snippets 目录中,在命令行上执行以下命令。

$ python ssml_addresses.py

Windows

python-docs-samples/texttospeech/snippets 目录中,在命令行上执行以下命令。

$env: C:/Python3/python.exe C: ssml_addresses.py

检查输出

此程序会输出合成语音的 example.mp3 音频文件。

Java

导航到 java-docs-samples/texttospeech/cloud-client/resources/ 目录。

resources 目录中查找 example.mp3 文件。

Node.js

导航到 nodejs-docs-samples/texttospeech/resources/ 目录。

resources 目录中查找 example.mp3 文件。

Python

导航到 python-docs-samples/texttospeech/snippets/resources

resources 目录中查找 example.mp3 文件。

聆听以下音频剪辑,以确认 example.mp3 文件的语音正是您所期望的语音。


问题排查

  • 忘记在命令行上设置 GOOGLE_APPLICATION_CREDENTIALS 环境变量会生成以下错误消息:

    The Application Default Credentials are not available.

  • text_to_ssml() 传递不存在的文件的名称会生成以下错误消息:

    IOError: [Errno 2] No such file or directory
    

  • ssml_to_audio() 传递包含 Nonessml_text 参数会生成以下错误消息:

    InvalidArgument: 400 Invalid input type. Type has to be text or SSML
    

  • 确保您运行的是来自正确目录的代码。

后续步骤

清理

为避免您的 Google Cloud Platform 账号因本教程中使用的资源而产生费用,如果您不需要该项目,请使用 Google Cloud 控制台删除该项目。

删除项目

  1. Google Cloud 控制台中,转到“项目”页面。
  2. 在项目列表中,选择要删除的项目,然后点击删除
  3. 在对话框中输入项目 ID,然后点击关停以删除项目。