Utilizzo del modello vocale personalizzato di cui hai eseguito il deployment

Prima di iniziare

  1. Hai fornito a Google un ID progetto Google Cloud per accedere alla funzionalità voce personalizzata. Assicurati di aver abilitato la fatturazione e abilitato l'API Text-to-Speech e l'API AutoML per questo progetto, nonché di aver installato e inizializzato Google Cloud CLI.

  2. Concedi il ruolo AutoML Predictor al tuo Account Google per il progetto. Per le istruzioni, vedi Concedere un singolo ruolo.

Utilizzo della riga di comando

Metodo HTTP e URL:

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Corpo JSON della richiesta:

{
  "input":{
    "text":"Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
  },
  "voice":{
    "custom_voice":{
      "reportedUsage":"REALTIME",
      "model":"projects/{project_id}/locations/us-central1/models/{model_id}",
     }
  },
  "audioConfig":{
    "audioEncoding":"LINEAR16"
  }
}

Salva il corpo della richiesta in un file denominato request.json ed esegui questo comando, sostituendo PROJECT_ID con il tuo ID progetto:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth print-access-token) \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://texttospeech.googleapis.com/v1beta1/text:synthesize

Utilizzo della libreria client Python

Scarica la libreria client ed esegui questo comando:

pip install texttospeech-custom-voice-beta-v1beta1-py.tar.gz
pip install protobuf --upgrade pip

Codice Python di esempio

"""Synthesize custom voice from the input string of text or ssml.
"""
from google.cloud import texttospeech_v1beta1

# Instantiate a client
client = texttospeech_v1beta1.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech_v1beta1.types.SynthesisInput(text="Hello, World!")

# Build the voice request, select the language code ("en-US") and specify
# custom voice model and speaker_id.
custom_voice = texttospeech_v1beta1.types.CustomVoiceParams(
    reported_usage=texttospeech_v1beta1.enums.CustomVoiceParams.ReportedUsage.REALTIME,
    model='projects/{project_id}/locations/us-central1/models/{model_id}')
voice = texttospeech_v1beta1.types.VoiceSelectionParams(
    language_code='en-US',
    custom_voice=custom_voice)

# Select the type of audio file you want returned
audio_config = texttospeech_v1beta1.types.AudioConfig(
    audio_encoding=texttospeech_v1beta1.enums.AudioEncoding.LINEAR16)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)

# The response's audio_content is binary.
with open('output.wav', 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.wav"')