デプロイしたカスタム音声モデルを使用する

始める前に

カスタム音声機能にアクセスするには、前もって Google に Google Cloud プロジェクト ID を提供済みである必要があります。そのうえで、該当プロジェクトで課金を有効にして Text-to-Speech API と AutoML API を有効にしていることと、Google Cloud CLI をインストールして初期化していることを確認してください。
プロジェクトの Google アカウントに AutoML 予測ロールを付与します。手順については、単一ロールの付与をご覧ください。

コマンドラインの使用

HTTP メソッドと URL:

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

リクエストの本文（JSON）:

{
  "input":{
    "text":"Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
  },
  "voice":{
    "custom_voice":{
      "reportedUsage":"REALTIME",
      "model":"projects/{project_id}/locations/us-central1/models/{model_id}",
     }
  },
  "audioConfig":{
    "audioEncoding":"LINEAR16"
  }
}

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。PROJECT_ID は実際のプロジェクト ID に置き換えます。

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://texttospeech.googleapis.com/v1beta1/text:synthesize

Python クライアントライブラリの使用

クライアントライブラリをダウンロードして、次のコマンドを実行します。

pip install texttospeech-custom-voice-beta-v1beta1-py.tar.gz
pip install protobuf --upgrade pip

Python サンプルコード

"""Synthesize custom voice from the input string of text or ssml.
"""
from google.cloud import texttospeech_v1beta1

# Instantiate a client
client = texttospeech_v1beta1.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech_v1beta1.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and specify
# custom voice model and speaker_id.
custom_voice = texttospeech_v1beta1.types.CustomVoiceParams(
    reported_usage=texttospeech_v1beta1.enums.CustomVoiceParams.ReportedUsage.REALTIME,
    model='projects/{project_id}/locations/us-central1/models/{model_id}')
voice = texttospeech_v1beta1.types.VoiceSelectionParams(
    language_code='en-US',
    custom_voice=custom_voice)

# Select the type of audio file you want returned
audio_config = texttospeech_v1beta1.types.AudioConfig(
    audio_encoding=texttospeech_v1beta1.enums.AudioEncoding.LINEAR16)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.wav"')

デプロイしたカスタム音声モデルを使用する

始める前に

コマンドラインの使用

Python クライアント ライブラリの使用

Python サンプルコード

Python クライアントライブラリの使用