Menggunakan profil perangkat untuk audio yang dihasilkan

Halaman ini menjelaskan cara memilih profil perangkat untuk audio yang dibuat oleh Text-to-Speech.

Anda dapat mengoptimalkan ucapan sintetis yang dihasilkan oleh Text-to-Speech untuk pemutaran di berbagai jenis hardware. Misalnya, jika aplikasi Anda berjalan terutama pada, jenis perangkat 'wearable' yang lebih kecil, Anda dapat membuat ucapan sintetis dari Text-to-Speech API yang dioptimalkan secara khusus untuk speaker yang lebih kecil.

Anda juga dapat menerapkan beberapa profil perangkat ke ucapan sintetis yang sama. Text-to-Speech API menerapkan profil perangkat ke audio sesuai urutan yang diberikan dalam permintaan ke endpoint text:synthesize. Jangan menentukan profil yang sama lebih dari sekali, karena Anda dapat menerima hasil yang tidak diinginkan dengan menerapkan profil yang sama berkali-kali.

Penggunaan profil audio bersifat opsional. Jika Anda memilih untuk menggunakan satu (atau beberapa) profil audio, Text-to-Speech akan menerapkan profil tersebut ke hasil ucapan pasca-sintesis Anda. Jika memilih untuk tidak menggunakan profil audio, Anda akan menerima hasil ucapan tanpa modifikasi pasca-sintesis.

Untuk mendengar perbedaan antara audio yang dihasilkan dari berbagai profil, bandingkan kedua klip di bawah ini.

Contoh 1. Audio yang dihasilkan dengan profil handset-class-device

Contoh 2. Audio yang dihasilkan dengan profil telephony-class-application

Catatan: Setiap profil audio telah dioptimalkan untuk perangkat tertentu dengan menyesuaikan rentang efek audio. Namun, merek dan model perangkat yang digunakan untuk menyesuaikan profil mungkin tidak sama persis dengan perangkat pemutaran pengguna. Anda mungkin perlu bereksperimen dengan berbagai profil guna menemukan output suara terbaik untuk aplikasi Anda.

Profil audio yang tersedia

Tabel berikut memberikan ID dan contoh profil perangkat yang tersedia untuk digunakan oleh Text-to-Speech API.

ID profil audio	Dioptimalkan untuk
`wearable-class-device`	Smartwatch dan perangkat wearable lainnya, seperti Apple Watch, smartwatch Wear OS
`handset-class-device`	Smartphone, seperti Google Pixel, Samsung Galaxy, Apple iPhone
`headphone-class-device`	Earbud atau headphone untuk pemutaran audio, seperti headphone Sennheiser
`small-bluetooth-speaker-class-device`	Speaker rumah kecil, seperti Google Home Mini
`medium-bluetooth-speaker-class-device`	Speaker smart home, seperti Google Home
`large-home-entertainment-class-device`	Sistem hiburan rumah atau smart TV, seperti Google Home Max, LG TV
`large-automotive-class-device`	Speaker mobil
`telephony-class-application`	Sistem Tanggapan Suara Interaktif (IVR)

Menentukan profil audio yang akan digunakan

Untuk menentukan profil audio yang akan digunakan, tetapkan kolom effectsProfileId untuk permintaan sintesis ucapan.

Protokol

Untuk membuat file audio, buat permintaan POST dan berikan isi permintaan yang sesuai. Berikut ini contoh permintaan POST yang menggunakan curl. Contoh ini menggunakan Google Cloud CLI untuk mengambil token akses untuk permintaan. Untuk mengetahui petunjuk tentang cara menginstal gcloud CLI, lihat Mengautentikasi ke Text-to-Speech.

Contoh berikut menunjukkan cara mengirim permintaan ke endpoint text:synthesize.

curl \
  -H "Authorization: Bearer "$(gcloud auth print-access-token) \
  -H "Content-Type: application/json; charset=utf-8" \
  --data "{
    'input':{
      'text':'This is a sentence that helps test how audio profiles can change the way Cloud Text-to-Speech sounds.'
    },
    'voice':{
      'languageCode':'en-us',
    },
    'audioConfig':{
      'audioEncoding':'LINEAR16',
      'effectsProfileId': ['telephony-class-application']
    }
  }" "https://texttospeech.googleapis.com/v1beta1/text:synthesize" > audio-profile.txt

Jika permintaan berhasil, Text-to-Speech API akan menampilkan audio yang disintesis sebagai data berenkode base64 yang terdapat dalam output JSON. Output JSON dalam file audio-profiles.txt akan terlihat seperti berikut:

{
  "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.."
}

Untuk mendekode hasil dari Cloud Text-to-Speech API sebagai file audio MP3, jalankan perintah berikut dari direktori yang sama dengan file audio-profiles.txt.

sed 's|audioContent| |' < audio-profile.txt > tmp-output.txt && \
tr -d '\n ":{}' < tmp-output.txt > tmp-output-2.txt && \
base64 tmp-output-2.txt --decode > audio-profile.wav && \
rm tmp-output*.txt

Go

Untuk mempelajari cara menginstal dan menggunakan library klien untuk Text-to-Speech, lihat library klien Text-to-Speech. Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi Text-to-Speech Go API.

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


import (
	"fmt"
	"io"
	"os"

	"context"

	texttospeech "cloud.google.com/go/texttospeech/apiv1"
	"cloud.google.com/go/texttospeech/apiv1/texttospeechpb"
)

// audioProfile generates audio from text using a custom synthesizer like a telephone call.
func audioProfile(w io.Writer, text string, outputFile string) error {
	// text := "hello"
	// outputFile := "out.mp3"

	ctx := context.Background()

	client, err := texttospeech.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &texttospeechpb.SynthesizeSpeechRequest{
		Input: &texttospeechpb.SynthesisInput{
			InputSource: &texttospeechpb.SynthesisInput_Text{Text: text},
		},
		Voice: &texttospeechpb.VoiceSelectionParams{LanguageCode: "en-US"},
		AudioConfig: &texttospeechpb.AudioConfig{
			AudioEncoding:    texttospeechpb.AudioEncoding_MP3,
			EffectsProfileId: []string{"telephony-class-application"},
		},
	}

	resp, err := client.SynthesizeSpeech(ctx, req)
	if err != nil {
		return fmt.Errorf("SynthesizeSpeech: %w", err)
	}

	if err = os.WriteFile(outputFile, resp.AudioContent, 0644); err != nil {
		return err
	}

	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile)

	return nil
}

Java

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

/**
 * Demonstrates using the Text to Speech client with audio profiles to synthesize text or ssml
 *
 * @param text the raw text to be synthesized. (e.g., "Hello there!")
 * @param effectsProfile audio profile to be used for synthesis. (e.g.,
 *     "telephony-class-application")
 * @throws Exception on TextToSpeechClient Errors.
 */
public static void synthesizeTextWithAudioProfile(String text, String effectsProfile)
    throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the text input to be synthesized
    SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();

    // Build the voice request
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US") // languageCode = "en_us"
            .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE
            .build();

    // Select the type of audio file you want returned and the audio profile
    AudioConfig audioConfig =
        AudioConfig.newBuilder()
            .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.
            .addEffectsProfileId(effectsProfile) // audio profile
            .build();

    // Perform the text-to-speech request
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file.
    try (OutputStream out = new FileOutputStream("output.mp3")) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file \"output.mp3\"");
    }
  }
}

Node.js

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const text = 'Text you want to vocalize';
// const outputFile = 'YOUR_OUTPUT_FILE_LOCAtION;
// const languageCode = 'LANGUAGE_CODE_FOR_OUTPUT';
// const ssmlGender = 'SSML_GENDER_OF_SPEAKER';

// Imports the Google Cloud client library
const speech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');

// Creates a client
const client = new speech.TextToSpeechClient();

async function synthesizeWithEffectsProfile() {
  // Add one or more effects profiles to array.
  // Refer to documentation for more details:
  // https://cloud.google.com/text-to-speech/docs/audio-profiles
  const effectsProfileId = ['telephony-class-application'];

  const request = {
    input: {text: text},
    voice: {languageCode: languageCode, ssmlGender: ssmlGender},
    audioConfig: {audioEncoding: 'MP3', effectsProfileId: effectsProfileId},
  };

  const [response] = await client.synthesizeSpeech(request);
  const writeFile = util.promisify(fs.writeFile);
  await writeFile(outputFile, response.audioContent, 'binary');
  console.log(`Audio content written to file: ${outputFile}`);
}

Python

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

def synthesize_text_with_audio_profile():
    """Synthesizes speech from the input string of text."""
    from google.cloud import texttospeech

    text = "hello"
    output = "output.mp3"
    effects_profile_id = "telephony-class-application"
    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.SynthesisInput(text=text)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.VoiceSelectionParams(language_code="en-US")

    # Note: you can pass in multiple effects_profile_id. They will be applied
    # in the same order they are provided.
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3,
        effects_profile_id=[effects_profile_id],
    )

    response = client.synthesize_speech(
        input=input_text, voice=voice, audio_config=audio_config
    )

    # The response's audio_content is binary.
    with open(output, "wb") as out:
        out.write(response.audio_content)
        print('Audio content written to file "%s"' % output)

Bahasa tambahan

C#: Ikuti Petunjuk penyiapan C# di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk .NET.

PHP: Ikuti Petunjuk penyiapan PHP di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk PHP.

Ruby: Ikuti Petunjuk penyiapan Ruby di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk Ruby.

Menggunakan profil perangkat untuk audio yang dihasilkan Tetap teratur dengan koleksi Simpan dan kategorikan konten berdasarkan preferensi Anda.

Profil audio yang tersedia

Menentukan profil audio yang akan digunakan

Protokol

Go

Java

Node.js

Python

Bahasa tambahan

Menggunakan profil perangkat untuk audio yang dihasilkan