Kurzanleitung für Custom Voice

Hinweis

  1. Sie haben Google eine GCP-Projekt-ID zugesandt, um auf das Feature Custom Voice zuzugreifen. Sorgen Sie dafür, dass die Abrechnung, die Text-to-Speech API und die AutoML API aktiviert und die Authentifizierung für dieses Projekt eingerichtet sind.

  2. Weisen Sie dem Dienstkonto, mit dem Sie Custom Voice synthetisieren möchten, die Rolle AutoML Predictor zu. Weitere Informationen finden Sie in der Google Cloud-Dokumentation zu IAM-Rollen und Dienstkonten.

Befehlszeile verwenden

HTTP-Methode und URL:

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

JSON-Text anfordern:

{
  "input":{
    "text":"Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
  },
  "voice":{
    "custom_voice":{
      "reportedUsage":"REALTIME",
      "model":"projects/{project_id}/locations/us-central1/models/{model_id}",
     }
  },
  "audioConfig":{
    "audioEncoding":"LINEAR16"
  }
}

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://texttospeech.googleapis.com/v1beta1/text:synthesize

Python-Clientbibliothek verwenden

Laden Sie die Clientbibliothek herunter und führen Sie den folgenden Befehl aus:

pip install texttospeech-custom-voice-beta-v1beta1-py.tar.gz
pip install protobuf --upgrade pip

Python-Beispielcode

"""Synthesize custom voice from the input string of text or ssml.
"""
from google.cloud import texttospeech_v1beta1

# Instantiate a client
client = texttospeech_v1beta1.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech_v1beta1.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and specify
# custom voice model and speaker_id.
custom_voice = texttospeech_v1beta1.types.CustomVoiceParams(
    reported_usage=texttospeech_v1beta1.enums.CustomVoiceParams.ReportedUsage.OFFLINE,
    model='projects/{project_id}/locations/us-central1/models/{model_id}')
voice = texttospeech_v1beta1.types.VoiceSelectionParams(
    language_code='en-US',
    custom_voice=custom_voice)

# Select the type of audio file you want returned
audio_config = texttospeech_v1beta1.types.AudioConfig(
    audio_encoding=texttospeech_v1beta1.enums.AudioEncoding.LINEAR16)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.wav"')