Coba Gemini 1.5 Pro, model multimodal kami yang paling canggih di Vertex AI, dan lihat apa yang dapat Anda bangun dengan jendela konteks token 1 juta. Coba Gemini 1.5 Pro, model multimodal kami yang paling canggih di Vertex AI, dan lihat apa yang dapat Anda bangun dengan jendela konteks token 1 juta.

Membuat file audio suara

Text-to-Speech memungkinkan Anda mengonversi kata dan kalimat menjadi data audio berenkode base64 dari ucapan alami manusia. Kemudian, Anda dapat mengonversi data audio menjadi file audio yang dapat diputar seperti MP3 dengan mendekode data base64. Text-to-Speech API menerima input sebagai teks mentah atau Bahasa Markup Sintesis Ucapan (SSML).

Dokumen ini menjelaskan cara membuat file audio dari input teks atau SSML menggunakan Text-to-Speech. Anda juga dapat meninjau artikel Dasar-dasar Text-to-Speech jika tidak memahami konsep seperti sintesis ucapan atau SSML.

Contoh ini mengharuskan Anda untuk menginstal dan melakukan inisialisasi Google Cloud CLI. Untuk mengetahui informasi cara menyiapkan gcloud CLI, lihat Mengautentikasi ke TTS.

Mengonversi teks ke audio suara sintetis

Contoh kode berikut menunjukkan cara mengonversi string menjadi data audio.

Anda dapat mengonfigurasi output sintesis ucapan dalam berbagai cara, termasuk memilih suara yang unik atau memodulasi output dalam nada, volume, kecepatan ucapan, dan frekuensi sampel.

Protocol

Lihat endpoint text:synthesize API untuk detail selengkapnya.

Untuk melakukan sintesis audio dari teks, buat permintaan POST HTTP ke endpoint text:synthesize. Dalam isi permintaan POST, tentukan jenis suara yang akan disintesis di bagian konfigurasi voice, tentukan teks yang akan disintesis dalam kolom text pada bagian input, dan tentukan jenis audio yang akan dibuat di bagian audioConfig.

Cuplikan kode berikut mengirimkan permintaan sintesis ke endpoint text:synthesize dan menyimpan hasilnya ke file bernama synthesize-text.txt. Ganti PROJECT_ID dengan project ID Anda.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "x-goog-user-project: <var>PROJECT_ID</var>" \
  -H "Content-Type: application/json; charset=utf-8" \
  --data "{
    'input':{
      'text':'Android is a mobile operating system developed by Google,
         based on the Linux kernel and designed primarily for
         touchscreen mobile devices such as smartphones and tablets.'
    },
    'voice':{
      'languageCode':'en-gb',
      'name':'en-GB-Standard-A',
      'ssmlGender':'FEMALE'
    },
    'audioConfig':{
      'audioEncoding':'MP3'
    }
  }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt

Text-to-Speech API menampilkan audio yang disintesis sebagai data berenkode base64 yang terdapat dalam output JSON. Output JSON dalam file synthesize-text.txt terlihat mirip dengan cuplikan kode berikut.

{
  "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.."
}

Untuk mendekode hasil dari Text-to-Speech API sebagai file audio MP3, jalankan perintah berikut dari direktori yang sama dengan file synthesize-text.txt.

cat synthesize-text.txt | grep 'audioContent' | \
sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \
base64 tmp.txt --decode > synthesize-text-audio.mp3 && \
rm tmp.txt

Go

Untuk mempelajari cara menginstal dan menggunakan library klien untuk Text-to-Speech, lihat library klien Text-to-Speech. Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi Text-to-Speech Go API.

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


// SynthesizeText synthesizes plain text and saves the output to outputFile.
func SynthesizeText(w io.Writer, text, outputFile string) error {
	ctx := context.Background()

	client, err := texttospeech.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	req := texttospeechpb.SynthesizeSpeechRequest{
		Input: &texttospeechpb.SynthesisInput{
			InputSource: &texttospeechpb.SynthesisInput_Text{Text: text},
		},
		// Note: the voice can also be specified by name.
		// Names of voices can be retrieved with client.ListVoices().
		Voice: &texttospeechpb.VoiceSelectionParams{
			LanguageCode: "en-US",
			SsmlGender:   texttospeechpb.SsmlVoiceGender_FEMALE,
		},
		AudioConfig: &texttospeechpb.AudioConfig{
			AudioEncoding: texttospeechpb.AudioEncoding_MP3,
		},
	}

	resp, err := client.SynthesizeSpeech(ctx, &req)
	if err != nil {
		return err
	}

	err = ioutil.WriteFile(outputFile, resp.AudioContent, 0644)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile)
	return nil
}

Java

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

/**
 * Demonstrates using the Text to Speech client to synthesize text or ssml.
 *
 * @param text the raw text to be synthesized. (e.g., "Hello there!")
 * @throws Exception on TextToSpeechClient Errors.
 */
public static ByteString synthesizeText(String text) throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the text input to be synthesized
    SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();

    // Build the voice request
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US") // languageCode = "en_us"
            .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE
            .build();

    // Select the type of audio file you want returned
    AudioConfig audioConfig =
        AudioConfig.newBuilder()
            .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.
            .build();

    // Perform the text-to-speech request
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file.
    try (OutputStream out = new FileOutputStream("output.mp3")) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file \"output.mp3\"");
      return audioContents;
    }
  }
}

Node.js

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

const textToSpeech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');

const client = new textToSpeech.TextToSpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const text = 'Text to synthesize, eg. hello';
// const outputFile = 'Local path to save audio file to, e.g. output.mp3';

const request = {
  input: {text: text},
  voice: {languageCode: 'en-US', ssmlGender: 'FEMALE'},
  audioConfig: {audioEncoding: 'MP3'},
};
const [response] = await client.synthesizeSpeech(request);
const writeFile = util.promisify(fs.writeFile);
await writeFile(outputFile, response.audioContent, 'binary');
console.log(`Audio content written to file: ${outputFile}`);

Python

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

def synthesize_text(text):
    """Synthesizes speech from the input string of text."""
    from google.cloud import texttospeech

    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.SynthesisInput(text=text)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Standard-C",
        ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    response = client.synthesize_speech(
        request={"input": input_text, "voice": voice, "audio_config": audio_config}
    )

    # The response's audio_content is binary.
    with open("output.mp3", "wb") as out:
        out.write(response.audio_content)
        print('Audio content written to file "output.mp3"')

Bahasa tambahan

C# : Ikuti Petunjuk penyiapan C# di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk .NET.

PHP : Ikuti Petunjuk penyiapan PHP di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk PHP.

Ruby : Ikuti Petunjuk penyiapan Ruby di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk Ruby.

Mengonversi SSML ke audio suara sintetis

Menggunakan SSML dalam permintaan sintesis audio dapat menghasilkan audio yang lebih mirip dengan ucapan alami manusia. Secara khusus, SSML memberi Anda kontrol yang lebih mendetail terkait cara output audio merepresentasikan jeda dalam ucapan atau cara audio melafalkan tanggal, waktu, akronim, dan singkatan.

Untuk mengetahui detail selengkapnya tentang elemen SSML yang didukung oleh Text-to-Speech API, lihat referensi SSML.

Protokol

Lihat endpoint text:synthesize API untuk detail selengkapnya.

Untuk melakukan sintesis audio dari SSML, buat permintaan POST HTTP ke endpoint text:synthesize. Dalam isi permintaan POST, tentukan jenis suara yang akan disintesis di bagian konfigurasi voice, tentukan SSML yang akan disintesis di kolom ssml dari bagian input, lalu tentukan jenis audio yang akan dibuat di bagian audioConfig.

Cuplikan kode berikut mengirimkan permintaan sintesis ke endpoint text:synthesize dan menyimpan hasilnya ke file bernama synthesize-ssml.txt. Ganti PROJECT_ID dengan project ID Anda.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "x-goog-user-project: <var>PROJECT_ID</var>" \
  -H "Content-Type: application/json; charset=utf-8" --data "{
    'input':{
     'ssml':'<speak>The <say-as interpret-as=\"characters\">SSML</say-as> standard
          is defined by the <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>'
    },
    'voice':{
      'languageCode':'en-us',
      'name':'en-US-Standard-B',
      'ssmlGender':'MALE'
    },
    'audioConfig':{
      'audioEncoding':'MP3'
    }
  }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-ssml.txt

Text-to-Speech API menampilkan audio yang disintesis sebagai data berenkode base64 yang terdapat dalam output JSON. Output JSON dalam file synthesize-ssml.txt terlihat mirip dengan cuplikan kode berikut.

{
  "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.."
}

Untuk mendekode hasil dari Text-to-Speech API sebagai file audio MP3, jalankan perintah berikut dari direktori yang sama dengan file synthesize-ssml.txt.

cat synthesize-ssml.txt | grep 'audioContent' | \
sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \
base64 tmp.txt --decode > synthesize-ssml-audio.mp3 && \
rm tmp.txt

Go

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.


// SynthesizeSSML synthesizes ssml and saves the output to outputFile.
//
// ssml must be well-formed according to:
//
//	https://www.w3.org/TR/speech-synthesis/
//
// Example: <speak>Hello there.</speak>
func SynthesizeSSML(w io.Writer, ssml, outputFile string) error {
	ctx := context.Background()

	client, err := texttospeech.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	req := texttospeechpb.SynthesizeSpeechRequest{
		Input: &texttospeechpb.SynthesisInput{
			InputSource: &texttospeechpb.SynthesisInput_Ssml{Ssml: ssml},
		},
		// Note: the voice can also be specified by name.
		// Names of voices can be retrieved with client.ListVoices().
		Voice: &texttospeechpb.VoiceSelectionParams{
			LanguageCode: "en-US",
			SsmlGender:   texttospeechpb.SsmlVoiceGender_FEMALE,
		},
		AudioConfig: &texttospeechpb.AudioConfig{
			AudioEncoding: texttospeechpb.AudioEncoding_MP3,
		},
	}

	resp, err := client.SynthesizeSpeech(ctx, &req)
	if err != nil {
		return err
	}

	err = ioutil.WriteFile(outputFile, resp.AudioContent, 0644)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "Audio content written to file: %v\n", outputFile)
	return nil
}

Java

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

/**
 * Demonstrates using the Text to Speech client to synthesize text or ssml.
 *
 * <p>Note: ssml must be well-formed according to: (https://www.w3.org/TR/speech-synthesis/
 * Example: <speak>Hello there.</speak>
 *
 * @param ssml the ssml document to be synthesized. (e.g., "<?xml...")
 * @throws Exception on TextToSpeechClient Errors.
 */
public static ByteString synthesizeSsml(String ssml) throws Exception {
  // Instantiates a client
  try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
    // Set the ssml input to be synthesized
    SynthesisInput input = SynthesisInput.newBuilder().setSsml(ssml).build();

    // Build the voice request
    VoiceSelectionParams voice =
        VoiceSelectionParams.newBuilder()
            .setLanguageCode("en-US") // languageCode = "en_us"
            .setSsmlGender(SsmlVoiceGender.FEMALE) // ssmlVoiceGender = SsmlVoiceGender.FEMALE
            .build();

    // Select the type of audio file you want returned
    AudioConfig audioConfig =
        AudioConfig.newBuilder()
            .setAudioEncoding(AudioEncoding.MP3) // MP3 audio.
            .build();

    // Perform the text-to-speech request
    SynthesizeSpeechResponse response =
        textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

    // Get the audio contents from the response
    ByteString audioContents = response.getAudioContent();

    // Write the response to the output file.
    try (OutputStream out = new FileOutputStream("output.mp3")) {
      out.write(audioContents.toByteArray());
      System.out.println("Audio content written to file \"output.mp3\"");
      return audioContents;
    }
  }
}

Node.js

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

const textToSpeech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');

const client = new textToSpeech.TextToSpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const ssml = '<speak>Hello there.</speak>';
// const outputFile = 'Local path to save audio file to, e.g. output.mp3';

const request = {
  input: {ssml: ssml},
  voice: {languageCode: 'en-US', ssmlGender: 'FEMALE'},
  audioConfig: {audioEncoding: 'MP3'},
};

const [response] = await client.synthesizeSpeech(request);
const writeFile = util.promisify(fs.writeFile);
await writeFile(outputFile, response.audioContent, 'binary');
console.log(`Audio content written to file: ${outputFile}`);

Python

Untuk mengautentikasi ke Text-to-Speech, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

def synthesize_ssml(ssml):
    """Synthesizes speech from the input string of ssml.

    Note: ssml must be well-formed according to:
        https://www.w3.org/TR/speech-synthesis/

    Example: <speak>Hello there.</speak>
    """
    from google.cloud import texttospeech

    client = texttospeech.TextToSpeechClient()

    input_text = texttospeech.SynthesisInput(ssml=ssml)

    # Note: the voice can also be specified by name.
    # Names of voices can be retrieved with client.list_voices().
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Standard-C",
        ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    response = client.synthesize_speech(
        input=input_text, voice=voice, audio_config=audio_config
    )

    # The response's audio_content is binary.
    with open("output.mp3", "wb") as out:
        out.write(response.audio_content)
        print('Audio content written to file "output.mp3"')

Bahasa tambahan

C# : Ikuti Petunjuk penyiapan C# di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk .NET.

PHP : Ikuti Petunjuk penyiapan PHP di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk PHP.

Ruby : Ikuti Petunjuk penyiapan Ruby di halaman library klien, lalu kunjungi dokumentasi referensi Text-to-Speech untuk Ruby.

Cobalah sendiri

Jika Anda baru menggunakan Google Cloud, buat akun untuk mengevaluasi performa Text-to-Speech dalam skenario dunia nyata. Pelanggan baru mendapatkan kredit gratis senilai $300 untuk menjalankan, menguji, dan men-deploy workload.

Coba Text-to-Speech gratis