Menggunakan model suara kustom yang di-deploy

Sebelum memulai

  1. Anda memberi Google project ID Google Cloud untuk mengakses Fitur Suara Kustom. Pastikan Anda telah mengaktifkan penagihan dan mengaktifkan Text-to-Speech API dan AutoML API untuk project ini, serta menginstal dan melakukan inisialisasi Google Cloud CLI.

  2. Berikan peran AutoML Predictor ke Akun Google Anda di project Anda. Untuk mengetahui petunjuknya, lihat Memberikan satu peran.

Menggunakan command line

Metode HTTP dan URL:

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Meminta isi JSON:

{
  "input":{
    "text":"Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
  },
  "voice":{
    "custom_voice":{
      "reportedUsage":"REALTIME",
      "model":"projects/{project_id}/locations/us-central1/models/{model_id}",
     }
  },
  "audioConfig":{
    "audioEncoding":"LINEAR16"
  }
}

Simpan isi permintaan dalam file bernama request.json dan jalankan perintah berikut, dengan mengganti PROJECT_ID dengan project ID Anda:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://texttospeech.googleapis.com/v1beta1/text:synthesize

Menggunakan library klien Python

Download library klien dan jalankan perintah berikut:

pip install texttospeech-custom-voice-beta-v1beta1-py.tar.gz
pip install protobuf --upgrade pip

Contoh kode Python

"""Synthesize custom voice from the input string of text or ssml.
"""
from google.cloud import texttospeech_v1beta1

# Instantiate a client
client = texttospeech_v1beta1.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech_v1beta1.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and specify
# custom voice model and speaker_id.
custom_voice = texttospeech_v1beta1.types.CustomVoiceParams(
    reported_usage=texttospeech_v1beta1.enums.CustomVoiceParams.ReportedUsage.REALTIME,
    model='projects/{project_id}/locations/us-central1/models/{model_id}')
voice = texttospeech_v1beta1.types.VoiceSelectionParams(
    language_code='en-US',
    custom_voice=custom_voice)

# Select the type of audio file you want returned
audio_config = texttospeech_v1beta1.types.AudioConfig(
    audio_encoding=texttospeech_v1beta1.enums.AudioEncoding.LINEAR16)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.wav"')