このページは Cloud Translation API によって翻訳されました。

長時間の音声を作成する

このドキュメントでは、長時間の音声を合成するプロセスを順を追って説明します。長時間の音声の合成では、最大 100 万バイトの入力を非同期で合成します。Text-to-Speech の基本コンセプトについて詳しくは、Text-to-Speech の基本をご覧ください。

始める前に

Text-to-Speech API にリクエストを送信する前に、以下の操作を完了していなければなりません。詳細については、始める前にのページをご覧ください。

GCP プロジェクトで Text-to-Speech を有効にします。
1. Text-to-Speech の課金が有効になっていることを確認する。
2. 出力 GCS バケットに関する次の Identity and Access Management（IAM）ロールがあることを確認します。
  - Storage オブジェクト作成者
  - Storage オブジェクト閲覧者
After installing the Google Cloud CLI, configure the gcloud CLI to use your federated identity and then initialize it by running the following command:
```
gcloud init
```

コマンドラインを使用してテキストから長い音声を合成する

長いテキストを音声に変換するには、https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio エンドポイントに対して HTTP POST リクエストを実行します。 POST コマンドの本文で、次のフィールドを指定します。

• voice: 合成する音声の種類。

• input.text: 合成するテキスト。

• audioConfig: 作成する音声の種類。

• output_gcs_uri: 「gs://bucket_name/file_name.wav」形式の GCS 出力ファイルのパス。

• parent: 「projects/{プロジェクトの番号}/locations/{プロジェクトの場所}」形式の親。

最大 1 MB までの文字を入力できます。正確な上限は入力値によって異なります。

合成に使用するプロジェクトで Google Cloud Storage バケットを作成します。合成に使用するサービスアカウントに、出力 GCS バケットに対する読み取り / 書き込みアクセス権があることを確認します。
コマンドラインで次の REST リクエストを実行して、Text-to-Speech でテキストから音声を合成します。このコマンドは、gcloud auth application-default print-access-token コマンドを使用してリクエストの承認トークンを取得します。

GET オペレーションを実行するサービスアカウントに Text-to-Speech 編集者のロールがあることを確認します。

HTTP メソッドと URL:
```
POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio
```
リクエストの本文（JSON）:
```
{
  "parent": "projects/12345/locations/global",
  "audio_config":{
      "audio_encoding":"LINEAR16"
  },
  "input":{
      "text":"hello"
  },
  "voice":{
      "language_code":"en-us",
      "name":"en-us-Standard-A"
  },
  "output_gcs_uri": "gs://bucket_name/file_name.wav"
}
```
リクエストを送信するには、次のいずれかのオプションを展開します。
curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。
```
curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio"
```
PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio" | Select-Object -Expand Content
```
次のような JSON レスポンスが返されます。
```
{
  "name": "23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 0,
    "startTime": "2022-12-20T00:46:56.296191037Z",
    "lastUpdateTime": "2022-12-20T00:46:56.296191037Z"
  },
  "done": false
}
```
REST コマンドの JSON 出力の name フィールドに長いオペレーション名が含まれます。コマンドラインで次の REST リクエストを実行して、長時間実行オペレーションの状態をクエリします。

GET 操作を実行するサービスアカウントが、合成に使用したプロジェクトと同じプロジェクトのものであることを確認します。

HTTP メソッドと URL:
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456
```
リクエストを送信するには、次のいずれかのオプションを展開します。
curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

次のコマンドを実行します。
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456"
```
PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

次のコマンドを実行します。
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456" | Select-Object -Expand Content
```
次のような JSON レスポンスが返されます。
```
{
  "name": "projects/12345/locations/global/operations/23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 100
  },
  "done": true
}
```
特定のプロジェクトで実行されているすべてのオペレーションのリストをクエリし、以下の REST リクエストを実行します。

LIST 操作を実行するサービスアカウントが、合成に使用したプロジェクトと同じプロジェクトのものであることを確認します。

HTTP メソッドと URL:
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations
```
リクエストを送信するには、次のいずれかのオプションを展開します。
curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

次のコマンドを実行します。
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations"
```
PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

次のコマンドを実行します。
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations" | Select-Object -Expand Content
```
次のような JSON レスポンスが返されます。
```
{
  "operations": [
    {
      "name": "12345",
      "done": false
    },
    {
      "name": "23456",
      "done": false
    }
  ],
  "nextPageToken": ""
}
```
長時間実行オペレーションが正常に完了したら、特定のバケット URI の出力フィールドで output_gcs_uri フィールドを見つけます。オペレーションが正常に完了していない場合は、GET REST コマンドを使用してクエリを実行し、エラーを修正して RPC を再度発行します。

クライアントライブラリを使用してテキストから長時間の音声を合成する

クライアントライブラリをインストールする

Python

ライブラリをインストールする前に、Python 開発用の環境を用意しておいてください。

pip install --upgrade google-cloud-texttospeech

音声データを作成する

Text-to-Speech を使用すると、人の音声を合成した長時間の音声ファイルを作成できます。次のコードを使用して、長時間の音声ファイルを GCS バケットに作成します。

Python

サンプルを実行する前に、Python 開発用の環境を用意しておいてください。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.cloud import texttospeech


def synthesize_long_audio(project_id: str, output_gcs_uri: str) -> None:
    """
    Synthesizes long input, writing the resulting audio to `output_gcs_uri`.

    Args:
        project_id: ID or number of the Google Cloud project you want to use.
        output_gcs_uri: Specifies a Cloud Storage URI for the synthesis results.
            Must be specified in the format:
            ``gs://bucket_name/object_name``, and the bucket must
            already exist.
    """

    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

    input = texttospeech.SynthesisInput(
        text="Test input. Replace this with any text you want to synthesize, up to 1 million bytes long!"
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", name="en-US-Standard-A"
    )

    parent = f"projects/{project_id}/locations/us-central1"

    request = texttospeech.SynthesizeLongAudioRequest(
        parent=parent,
        input=input,
        audio_config=audio_config,
        voice=voice,
        output_gcs_uri=output_gcs_uri,
    )

    operation = client.synthesize_long_audio(request=request)
    # Set a deadline for your LRO to finish. 300 seconds is reasonable, but can be adjusted depending on the length of the input.
    # If the operation times out, that likely means there was an error. In that case, inspect the error, and try again.
    result = operation.result(timeout=300)
    print(
        "\nFinished processing, check your GCS bucket to find your audio file! Printing what should be an empty result: ",
        result,
    )

クリーンアップ

不要な Google Cloud Platform 料金が発生しないようにするには、Google Cloud console を使用して、不要なプロジェクトを削除します。

次のステップ

Cloud Text-to-Speech の詳細については、基本をご覧ください。
合成音声に利用可能な音声の一覧を確認します。

長時間の音声を作成する

始める前に

コマンドラインを使用してテキストから長い音声を合成する

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

クライアント ライブラリを使用してテキストから長時間の音声を合成する

クライアント ライブラリをインストールする

Python

音声データを作成する

Python

クリーンアップ

次のステップ

クライアントライブラリを使用してテキストから長時間の音声を合成する

クライアントライブラリをインストールする