创建长音频

本文档将引导您完成合成长音频的过程。长音频合成在输入上异步合成最多 100 万个字节。如需详细了解 Text-to-Speech 中的基本概念，请阅读 Text-to-Speech 基础知识。

准备工作

您必须先完成以下操作，然后才能向 Text-to-Speech API 发送请求。如需了解详情，请参阅准备工作页面。

在 Google Cloud 项目上启用 Text-to-Speech。
1. 确保已为 Text-to-Speech 启用结算功能。
2. 确保您具有输出 Google Cloud 存储桶的以下 Identity and Access Management (IAM) 角色。
  - Storage Object Creator
  - Storage Object Viewer
安装 Google Cloud CLI。安装完成后，运行以下命令来初始化 Google Cloud CLI：
```
gcloud init
```
如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

使用命令行将文字合成为长音频

您可以通过向 https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio 端点发出 HTTP POST 请求，将长文本转换为音频。在 POST 命令正文中，指定以下字段。

• voice：要合成的语音类型。

• input.text：要合成的文本。

• audioConfig：要创建的音频类型。

• output_gcs_uri： Google Cloud 输出路径，格式为“gs://bucket_name/file_name.wav”。

• parent：父级路径，格式为“projects/{YOUR_PROJECT_NUMBER}/locations/{YOUR_PROJECT_LOCATION}”。

输入最多可包含 1MB 字符，确切限制因不同的输入而异。

在用于运行合成的项目下创建一个 Google Cloud 存储桶。确保用于运行合成的服务账号拥有对输出 Google Cloud 存储桶的读写权限。

在命令行执行 REST 请求，以使用 Text-to-Speech 从文本合成音频。该命令使用 gcloud auth application-default print-access-token 命令检索请求的授权令牌。

HTTP 方法和网址：

POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio

请求 JSON 正文：

{
  "parent": "projects/12345/locations/global",
  "audio_config":{
      "audio_encoding":"LINEAR16"
  },
  "input":{
      "text":"hello"
  },
  "voice":{
      "language_code":"en-us",
      "name":"en-us-Standard-A"
  },
  "output_gcs_uri": "gs://bucket_name/file_name.wav"
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 0,
    "startTime": "2022-12-20T00:46:56.296191037Z",
    "lastUpdateTime": "2022-12-20T00:46:56.296191037Z"
  },
  "done": false
}

REST 命令的 JSON 输出包含 name 字段中长时间运行的操作的名称。在命令行执行 REST 请求，以查询长时间运行的操作的状态。

确保运行 GET 操作的服务账号与用于合成的账号位于同一项目中。

HTTP 方法和网址：
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456
```
如需发送您的请求，请展开以下选项之一：
curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456"
```
PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456" | Select-Object -Expand Content
```
您应该收到类似以下内容的 JSON 响应：
```
{
  "name": "projects/12345/locations/global/operations/23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 100
  },
  "done": true
}
```
查询在给定项目下运行的所有操作的列表，并执行 REST 请求。

确保运行 LIST 操作的服务账号与用于合成的账号位于同一项目中。

HTTP 方法和网址：
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations
```
如需发送您的请求，请展开以下选项之一：
curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations"
```
PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations" | Select-Object -Expand Content
```
您应该收到类似以下内容的 JSON 响应：
```
{
  "operations": [
    {
      "name": "12345",
      "done": false
    },
    {
      "name": "23456",
      "done": false
    }
  ],
  "nextPageToken": ""
}
```
长时间运行的操作成功完成后，在 output_gcs_uri 字段的给定存储桶 URI 中找到输出音频文件。如果操作未成功完成，请使用 GET REST 命令找到错误，更正错误，然后再次发出 RPC。

使用客户端库将文字合成为长音频

请按照以下说明合成长音频。

安装客户端库

Python

在安装库之前，请确保已经为 Python 开发准备好环境。

pip install --upgrade google-cloud-texttospeech

创建音频数据

您可以使用 Text-to-Speech 来创建合成人类语音的长音频文件。使用以下代码在 Google Cloud 存储桶中创建长音频文件。

Python

在运行该示例之前，请确保已经为 Python 开发准备好环境。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.cloud import texttospeech


def synthesize_long_audio(project_id: str, output_gcs_uri: str) -> None:
    """
    Synthesizes long input, writing the resulting audio to `output_gcs_uri`.

    Args:
        project_id: ID or number of the Google Cloud project you want to use.
        output_gcs_uri: Specifies a Cloud Storage URI for the synthesis results.
            Must be specified in the format:
            ``gs://bucket_name/object_name``, and the bucket must
            already exist.
    """

    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

    input = texttospeech.SynthesisInput(
        text="Test input. Replace this with any text you want to synthesize, up to 1 million bytes long!"
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", name="en-US-Standard-A"
    )

    parent = f"projects/{project_id}/locations/us-central1"

    request = texttospeech.SynthesizeLongAudioRequest(
        parent=parent,
        input=input,
        audio_config=audio_config,
        voice=voice,
        output_gcs_uri=output_gcs_uri,
    )

    operation = client.synthesize_long_audio(request=request)
    # Set a deadline for your LRO to finish. 300 seconds is reasonable, but can be adjusted depending on the length of the input.
    # If the operation times out, that likely means there was an error. In that case, inspect the error, and try again.
    result = operation.result(timeout=300)
    print(
        "\nFinished processing, check your GCS bucket to find your audio file! Printing what should be an empty result: ",
        result,
    )

清理

若不再需要该项目，请使用Google Cloud console 将其删除，以避免产生不必要的 Google Cloud 费用。

后续步骤

如需详细了解 Cloud Text-to-Speech，请阅读基础知识。
查看可用于合成语音的可用语音列表。