Testen Sie Gemini 1.5 Pro, unser fortschrittlichstes multimodales Modell in Vertex AI, und sehen Sie selbst, was Sie mit einem Kontextfenster von 1 Mio. Tokens erstellen können. Testen Sie Gemini 1.5 Pro, unser fortschrittlichstes multimodales Modell in Vertex AI, und sehen Sie selbst, was Sie mit einem Kontextfenster von 1 Mio. Tokens erstellen können.

Geräteprofile für generierte Audioinhalte verwenden

Auf dieser Seite wird beschrieben, wie Sie ein Geräteprofil für Audioinhalte auswählen, die mit Text-to-Speech erstellt wurden.

Sie können die von Text-to-Speech generierte synthetische Sprache für die Wiedergabe auf verschiedenen Hardwaretypen optimieren. Wenn Ihre Anwendung beispielsweise hauptsächlich auf kleineren, am Körper tragbaren Gerätetypen ausgeführt wird, können Sie über die Text-to-Speech API synthetische Sprache erstellen, die speziell für kleinere Lautsprecher optimiert ist.

Sie können auch mehrere Geräteprofile auf dieselbe synthetische Sprache anwenden. In der Text-to-Speech API werden Geräteprofile in jener Reihenfolge auf die Audioinhalte angewendet, in der sie in der Anfrage an den Endpunkt text:synthesize angegeben sind. Geben Sie jedes Profil nur einmal an. Das mehrmalige Anwenden eines Profils kann zu unerwünschten Ergebnissen führen.

Die Verwendung von Audioprofilen ist optional. Wenn Sie ein oder mehrere verwenden, wendet Text-to-Speech die Profile auf die Ergebnisse nach der Sprachsynthese an. Wenn Sie kein Audioprofil verwenden, erhalten Sie Ihre Sprachergebnisse ohne Änderungen nach der Synthese.

Vergleichen Sie die folgenden beiden Clips, um den Unterschied zwischen Audioinhalten mit verschiedenen Profilen zu hören.

1. Beispiel Mit dem Profil handset-class-device erstellte Audioinhalte

2. Beispiel Mit dem Profil telephony-class-application erstellte Audioinhalte

Hinweis: Jedes Audioprofil wurde durch Anpassen verschiedener Audioeffekte für ein bestimmtes Gerät optimiert. Marke und Modell des für die Optimierung des Profils verwendeten Geräts stimmen jedoch nicht unbedingt genau mit den Wiedergabegeräten der Nutzer überein. Möglicherweise müssen Sie mit verschiedenen Profilen experimentieren, um die beste Tonausgabe für Ihre Anwendung zu ermitteln.

Verfügbare Audioprofile

Die folgende Tabelle enthält die IDs und Beispiele der Geräteprofile, die von der Text-to-Speech API verwendet werden können.

Audioprofil-ID	Optimiert für
`wearable-class-device`	Smartwatches und andere Wearables, z. B. Apple Watch oder Wear OS-Uhr
`handset-class-device`	Smartphones, z. B. Google Pixel, Samsung Galaxy oder Apple iPhone
`headphone-class-device`	Ohrhörer oder Kopfhörer für die Audiowiedergabe, z. B. Sennheiser-Kopfhörer
`small-bluetooth-speaker-class-device`	Kleine Lautsprecher für zu Hause, z. B. Google Home Mini
`medium-bluetooth-speaker-class-device`	Smart Speaker für zu Hause, z. B. Google Home
`large-home-entertainment-class-device`	Home-Entertainment-Systeme oder Smart-TVs, z. B. Google Home Max oder LG TV
`large-automotive-class-device`	Autolautsprecher
`telephony-class-application`	IVR-Systeme (Interactive Voice Response)

Audioprofil angeben

Verwenden Sie zum Angeben eines Audioprofils für die Sprachsyntheseanfrage das Feld effectsProfileId.

Protokoll

Stellen Sie zum Generieren einer Audiodatei eine POST-Anfrage und geben Sie den entsprechenden Anfragetext ein. Das folgende Beispiel zeigt eine POST-Anfrage mit curl. In diesem Beispiel wird das Zugriffstoken für ein Dienstkonto verwendet, das mit dem Cloud SDK der Google Cloud Platform für das Projekt eingerichtet wurde. Anleitungen zur Installation des Cloud SDK, zur Einrichtung eines Projekts mit einem Dienstkonto und zur Anforderung eines Zugriffstokens finden Sie in den Kurzanleitungen.

Im folgenden Beispiel ist zu sehen, wie eine Anfrage an den Endpunkt text:synthesize gesendet wird.

curl \
  -H "Authorization: Bearer "$(gcloud auth print-access-token) \
  -H "Content-Type: application/json; charset=utf-8" \
  --data "{
    'input':{
      'text':'This is a sentence that helps test how audio profiles can change the way Cloud Text-to-Speech sounds.'
    },
    'voice':{
      'languageCode':'en-us',
    },
    'audioConfig':{
      'audioEncoding':'LINEAR16',
      'effectsProfileId': ['telephony-class-application']
    }
  }" "https://texttospeech.googleapis.com/v1beta1/text:synthesize" > audio-profile.txt

Bei erfolgreicher Anfrage gibt die Text-to-Speech API in der JSON-Ausgabe die synthetisierten Audioinhalte als Base64-codierte Daten zurück. Die JSON-Ausgabe in der Datei audio-profiles.txt sieht so aus:

{
  "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.."
}

Wenn Sie die Ergebnisse der Cloud Text-to-Speech API als MP3-Audiodatei decodieren möchten, führen Sie den folgenden Befehl aus demselben Verzeichnis wie die Datei audio-profiles.txt aus.

sed 's|audioContent| |' < audio-profile.txt > tmp-output.txt && \
tr -d '\n ":{}' < tmp-output.txt > tmp-output-2.txt && \
base64 tmp-output-2.txt --decode > audio-profile.wav && \
rm tmp-output*.txt

Go

Auf GitHub ansehen Feedback


import (
	"fmt"
	"io"
	"io/ioutil"

	"context"

	texttospeech "cloud.google.com/go/texttospeech/apiv1"
	texttospeechpb "google.golang.org/genproto/googleapis/cloud/texttospeech/v1"
)

// audioProfile generates audio from text using a custom synthesizer like a telephone call.
func audioProfile(w io.Writer, text string, outputFile string) error {
	// text := "hello"
	// outputFile := "out.mp3"

	ctx := context.Background()

	client, err := texttospeech.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &texttospeechpb.SynthesizeSpeechRequest{
		Input: &texttospeechpb.SynthesisInput{
			InputSource: &texttospeechpb.SynthesisInput_Text{Text: text},
		},
		Voice: &texttospeechpb.VoiceSelectionParams{LanguageCode: "en-US"},
		AudioConfig: &texttospeechpb.AudioConfig{
			AudioEncoding:    texttospeechpb.AudioEncoding_MP3,
			EffectsProfileId: []string{"telephony-class-application"},
		},
	}

	resp, err := client.SynthesizeSpeech(ctx, req)
	if err != nil {
		return fmt.Errorf("SynthesizeSpeech: %v", err)
	}

	if err = ioutil.WriteFile(outputFile, resp.AudioContent, 0644); err != nil {
		return err
	}

	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile)

	return nil
}

Java

Auf GitHub ansehen Feedback

/**
 * Demonstrates using the Text to Speech client with audio profiles to synthesize text or ssml
 *
 * @param text the raw text to be synthesized. (e.g., "Hello there!")
 * @param effectsProfile audio profile to be used for synthesis. (e.g.,
 *     "telephony-class-application")
 * @throws Exception on TextToSpeechClient Errors.
 */
public static void synthesizeTextWithAudioProfile(String text, String effectsProfile)
    throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the text input to be synthesized
    SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();

    // Build the voice request
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US") // languageCode = "en_us"
            .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE
            .build();

    // Select the type of audio file you want returned and the audio profile
    AudioConfig audioConfig =
        AudioConfig.newBuilder()
            .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.
            .addEffectsProfileId(effectsProfile) // audio profile
            .build();

    // Perform the text-to-speech request
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file.
    try (OutputStream out = new FileOutputStream("output.mp3")) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file \"output.mp3\"");
    }
  }
}

Node.js

Auf GitHub ansehen Feedback


/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const text = 'Text you want to vocalize';
// const outputFile = 'YOUR_OUTPUT_FILE_LOCAtION;
// const languageCode = 'LANGUAGE_CODE_FOR_OUTPUT';
// const ssmlGender = 'SSML_GENDER_OF_SPEAKER';

// Imports the Google Cloud client library
const speech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');

// Creates a client
const client = new speech.TextToSpeechClient();

async function synthesizeWithEffectsProfile() {
  // Add one or more effects profiles to array.
  // Refer to documentation for more details:
  // https://cloud.google.com/text-to-speech/docs/audio-profiles
  const effectsProfileId = ['telephony-class-application'];

  const request = {
    input: {text: text},
    voice: {languageCode: languageCode, ssmlGender: ssmlGender},
    audioConfig: {audioEncoding: 'MP3', effectsProfileId: effectsProfileId},
  };

  const [response] = await client.synthesizeSpeech(request);
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outputFile, response.audioContent, 'binary');
  console.log(`Audio content written to file: ${outputFile}`);
}

Python

Auf GitHub ansehen Feedback

def synthesize_text_with_audio_profile(text, output, effects_profile_id):
    """Synthesizes speech from the input string of text."""
    from google.cloud import texttospeech

    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.SynthesisInput(text=text)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.VoiceSelectionParams(language_code="en-US")

    # Note: you can pass in multiple effects_profile_id. They will be applied
    # in the same order they are provided.
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3,
        effects_profile_id=[effects_profile_id],
    )

    response = client.synthesize_speech(
        input=input_text, voice=voice, audio_config=audio_config
    )

    # The response's audio_content is binary.
    with open(output, "wb") as out:
        out.write(response.audio_content)
        print('Audio content written to file "%s"' % output)

Weitere Sprachen

C#: Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite "Clientbibliotheken" und rufen Sie dann die Text-to-Speech-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite "Clientbibliotheken" und rufen Sie dann die Text-to-Speech-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite "Clientbibliotheken" und rufen Sie dann die Text-to-Speech-Referenzdokumentation für Ruby auf.