Media Translation API tidak digunakan lagi dan tidak akan tersedia lagi di Google Cloud setelah 1 Juli 2024. Anda dapat mereplikasi fungsi Media Translation API melalui kombinasi layanan Google Cloud lainnya seperti Cloud Speech-to-Text dan Cloud Translation API.

Menerjemahkan audio streaming ke teks

Media Translation menerjemahkan file audio atau aliran data ucapan ke dalam teks dalam bahasa lain. Halaman ini menyediakan contoh kode yang menunjukkan cara menerjemahkan audio streaming menjadi teks menggunakan library klien Media Translation.

Menyiapkan project

Sebelum dapat menggunakan Media Translation, Anda perlu menyiapkan project Google Cloud dan mengaktifkan Media Translation API untuk project tersebut.

Login ke akun Google Cloud Anda. Jika Anda baru menggunakan Google Cloud, buat akun untuk mengevaluasi performa produk kami dalam skenario dunia nyata. Pelanggan baru juga mendapatkan kredit gratis senilai $300 untuk menjalankan, menguji, dan men-deploy workload.

Di konsol Google Cloud, pada halaman pemilih project, pilih atau buat project Google Cloud.

Buka pemilih project

Pastikan penagihan telah diaktifkan untuk project Google Cloud Anda.

Enable the Media Translation API.

Enable the API

Buat akun layanan:

Di konsol Google Cloud, buka halaman Buat akun layanan.
Buka Create service account
Pilih project Anda.
Di kolom Nama akun layanan, masukkan nama. Konsol Google Cloud akan mengisi kolom ID akun layanan berdasarkan nama ini.

Di kolom Deskripsi akun layanan, masukkan sebuah deskripsi. Sebagai contoh, Service account for quickstart.
Klik Buat dan lanjutkan.
Berikan peran Project > Owner ke akun layanan.

Untuk memberikan peran, temukan daftar Pilih peran, lalu pilih Project > Owner.

Catatan: Kolom Peran memengaruhi resource mana yang dapat diakses oleh akun layanan di project Anda. Anda dapat mencabut peran ini atau memberikan peran tambahan di lain waktu. Dalam lingkungan produksi, jangan berikan peran Owner, Editor, atau Viewer. Sebagai gantinya, berikan peran standar atau peran khusus yang memenuhi kebutuhan Anda.
Klik Lanjutkan.
Klik Selesai untuk menyelesaikan pembuatan akun layanan.

Jangan tutup jendela browser Anda. Anda akan menggunakannya pada langkah berikutnya.

Membuat kunci akun layanan:

Di konsol Google Cloud, klik alamat email untuk akun layanan yang telah dibuat.
Klik Kunci.
Klik Tambahkan kunci, lalu klik Buat kunci baru.
Klik Create. File kunci JSON akan didownload ke komputer Anda.
Klik Close.

Tetapkan variabel lingkungan GOOGLE_APPLICATION_CREDENTIALS ke jalur file JSON yang berisi kredensial Anda. Variabel ini hanya berlaku untuk sesi shell Anda saat ini. Jadi, jika Anda membuka sesi baru, tetapkan variabel kembali.

Contoh: Linux atau macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Contoh:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Contoh: Windows

Untuk PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Contoh:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

Untuk command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Menginstal Google Cloud CLI.

Untuk initialize gcloud CLI, jalankan perintah berikut:

gcloud init

Di konsol Google Cloud, pada halaman pemilih project, pilih atau buat project Google Cloud.

Buka pemilih project

Pastikan penagihan telah diaktifkan untuk project Google Cloud Anda.

Enable the Media Translation API.

Enable the API

Buat akun layanan:

Di konsol Google Cloud, buka halaman Buat akun layanan.
Buka Create service account
Pilih project Anda.
Di kolom Nama akun layanan, masukkan nama. Konsol Google Cloud akan mengisi kolom ID akun layanan berdasarkan nama ini.

Di kolom Deskripsi akun layanan, masukkan sebuah deskripsi. Sebagai contoh, Service account for quickstart.
Klik Buat dan lanjutkan.
Berikan peran Project > Owner ke akun layanan.

Untuk memberikan peran, temukan daftar Pilih peran, lalu pilih Project > Owner.

Catatan: Kolom Peran memengaruhi resource mana yang dapat diakses oleh akun layanan di project Anda. Anda dapat mencabut peran ini atau memberikan peran tambahan di lain waktu. Dalam lingkungan produksi, jangan berikan peran Owner, Editor, atau Viewer. Sebagai gantinya, berikan peran standar atau peran khusus yang memenuhi kebutuhan Anda.
Klik Lanjutkan.
Klik Selesai untuk menyelesaikan pembuatan akun layanan.

Jangan tutup jendela browser Anda. Anda akan menggunakannya pada langkah berikutnya.

Membuat kunci akun layanan:

Di konsol Google Cloud, klik alamat email untuk akun layanan yang telah dibuat.
Klik Kunci.
Klik Tambahkan kunci, lalu klik Buat kunci baru.
Klik Create. File kunci JSON akan didownload ke komputer Anda.
Klik Close.

Contoh: Linux atau macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Contoh:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

Contoh: Windows

Untuk PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Contoh:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

Untuk command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

Ganti KEY_PATH dengan jalur file JSON yang berisi kredensial Anda.

Menginstal Google Cloud CLI.

Untuk initialize gcloud CLI, jalankan perintah berikut:

gcloud init

Instal library klien untuk bahasa pilihan Anda.

Menerjemahkan ucapan

Contoh kode di bawah menunjukkan cara menerjemahkan ucapan dari file yang berisi audio hingga lima menit atau dari mikrofon live. Lihat Praktik terbaik untuk mengetahui rekomendasi cara memberikan data ucapan dengan akurasi terbaik dalam pengenalan.

Langkah-langkah utamanya tetap sama, apa pun sumber audionya:

Lakukan inisialisasi klien SpeechTranslationServiceClient yang akan digunakan untuk mengirim permintaan ke Media Translation.

Anda dapat menggunakan kembali klien yang sama untuk beberapa permintaan.
Buat objek permintaan StreamingTranslateSpeechConfig yang menentukan cara memproses audio.

Objek StreamingTranslateSpeechConfig terdiri dari objek TranslateSpeechConfig yang memberikan informasi tentang file sumber audio dan properti single_utterance yang menentukan apakah Media Translation terus menerjemahkan saat speaker dijeda.

Objek TranslateSpeechConfig memberikan spesifikasi teknis untuk sumber audio (seperti encoding dan frekuensi sampelnya), menetapkan bahasa sumber dan target untuk terjemahannya (menggunakan kode bahasa BCP-47 miliknya), dan menentukan model terjemahan yang digunakan Media Translation untuk transkripsi.

Catatan: Menentukan model terjemahan bersifat opsional. Media Translation menggunakan model default dasar untuk sebagian besar permintaan. Untuk beberapa pasangan bahasa, tersedia model yang ditingkatkan serta dioptimalkan untuk panggilan telepon atau video. Lihat halaman Dukungan bahasa untuk model yang tersedia.
Kirim urutan objek permintaan StreamingTranslateSpeechRequest.

Anda mengirimkan urutan permintaan untuk setiap file audio yang ingin diterjemahkan. Permintaan pertama menyediakan objek StreamingTranslateSpeechConfig untuk permintaan tersebut dan permintaan berikut menyediakan konten audio dalam streaming.
Terima objek respons StreamingTranslateSpeechResult.

Meskipun semua respons dengan nilai text_translation_result.is_final false diterima, hasil yang terakhir diterjemahkan akan menimpa hasil sebelumnya.

Saat Media Translation memiliki hasil akhir, kolom text_translation_result.is_final ditetapkan ke true, dan setiap hasil terjemahan yang diterima kemudian ditambahkan ke hasil sebelumnya. (Dalam hal ini, hasil sebelumnya tidak ditimpa). Anda dapat membuat output terjemahan yang sudah selesai, dan memulai dengan bagian baru untuk bagian berikutnya dari transkripsi dan audio yang sesuai.

Saat speaker berhenti, jika kolom single_utterance disetel ke benar (true) dalam objek permintaan StreamingTranslateSpeechConfig, Media Translation akan menampilkan peristiwa END_OF_SINGLE_UTTERANCE untuk speech_event_type dalam respons. Klien akan berhenti mengirim permintaan tetapi akan terus menerima respons hingga terjemahan selesai.
Streaming memiliki batas 5 menit. Melebihi batas ini akan menampilkan error OUT_OF_RANGE.

Contoh kode

Menerjemahkan ucapan dari file audio

Java

Untuk mempelajari cara menginstal dan menggunakan library klien untuk Media Translation, lihat library klien Media Translation. Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Java Media Translation.

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub


import com.google.api.gax.rpc.BidiStream;
import com.google.cloud.mediatranslation.v1beta1.SpeechTranslationServiceClient;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechConfig;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechRequest;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechResponse;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechResult;
import com.google.cloud.mediatranslation.v1beta1.TranslateSpeechConfig;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class TranslateFromFile {

  public static void translateFromFile() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String filePath = "path/to/audio.raw";
    translateFromFile(filePath);
  }

  public static void translateFromFile(String filePath) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (SpeechTranslationServiceClient client = SpeechTranslationServiceClient.create()) {
      Path path = Paths.get(filePath);
      byte[] content = Files.readAllBytes(path);
      ByteString audioContent = ByteString.copyFrom(content);

      TranslateSpeechConfig audioConfig =
          TranslateSpeechConfig.newBuilder()
              .setAudioEncoding("linear16")
              .setSampleRateHertz(16000)
              .setSourceLanguageCode("en-US")
              .setTargetLanguageCode("fr-FR")
              .build();

      StreamingTranslateSpeechConfig config =
          StreamingTranslateSpeechConfig.newBuilder()
              .setAudioConfig(audioConfig)
              .setSingleUtterance(true)
              .build();

      BidiStream<StreamingTranslateSpeechRequest, StreamingTranslateSpeechResponse> bidiStream =
          client.streamingTranslateSpeechCallable().call();

      // The first request contains the configuration.
      StreamingTranslateSpeechRequest requestConfig =
          StreamingTranslateSpeechRequest.newBuilder().setStreamingConfig(config).build();

      // The second request contains the audio
      StreamingTranslateSpeechRequest request =
          StreamingTranslateSpeechRequest.newBuilder().setAudioContent(audioContent).build();

      bidiStream.send(requestConfig);
      bidiStream.send(request);

      for (StreamingTranslateSpeechResponse response : bidiStream) {
        // Once the transcription settles, the response contains the
        // is_final result. The other results will be for subsequent portions of
        // the audio.
        StreamingTranslateSpeechResult res = response.getResult();
        String translation = res.getTextTranslationResult().getTranslation();

        if (res.getTextTranslationResult().getIsFinal()) {
          System.out.println(String.format("\nFinal translation: %s", translation));
          break;
        }
        System.out.println(String.format("\nPartial translation: %s", translation));
      }
    }
  }
}

Node.js

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub

const fs = require('fs');

// Imports the CLoud Media Translation client library
const {
  SpeechTranslationServiceClient,
} = require('@google-cloud/media-translation');

// Creates a client
const client = new SpeechTranslationServiceClient();

async function translate_from_file() {
  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
  // const encoding = 'Encoding of the audio file, e.g. LINEAR16';
  // const sourceLanguage = 'BCP-47 source language code, e.g. en-US';
  // const targetLanguage = 'BCP-47 target language code, e.g. es-ES';

  const config = {
    audioConfig: {
      audioEncoding: encoding,
      sourceLanguageCode: sourceLanguage,
      targetLanguageCode: targetLanguage,
    },
    single_utterance: true,
  };

  // First request needs to have only a streaming config, no data.
  const initialRequest = {
    streamingConfig: config,
    audioContent: null,
  };

  const readStream = fs.createReadStream(filename, {
    highWaterMark: 4096,
    encoding: 'base64',
  });

  const chunks = [];
  readStream
    .on('data', chunk => {
      const request = {
        streamingConfig: config,
        audioContent: chunk.toString(),
      };
      chunks.push(request);
    })
    .on('close', () => {
      // Config-only request should be first in stream of requests
      stream.write(initialRequest);
      for (let i = 0; i < chunks.length; i++) {
        stream.write(chunks[i]);
      }
      stream.end();
    });

  const stream = client.streamingTranslateSpeech().on('data', response => {
    const {result} = response;
    if (result.textTranslationResult.isFinal) {
      console.log(
        `\nFinal translation: ${result.textTranslationResult.translation}`
      );
      console.log(`Final recognition result: ${result.recognitionResult}`);
    } else {
      console.log(
        `\nPartial translation: ${result.textTranslationResult.translation}`
      );
      console.log(`Partial recognition result: ${result.recognitionResult}`);
    }
  });

Python

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub

from google.cloud import mediatranslation

def translate_from_file(file_path="path/to/your/file"):
    client = mediatranslation.SpeechTranslationServiceClient()

    # The `sample_rate_hertz` field is not required for FLAC and WAV (Linear16)
    # encoded data. Other audio encodings must provide the sampling rate.
    audio_config = mediatranslation.TranslateSpeechConfig(
        audio_encoding="linear16",
        source_language_code="en-US",
        target_language_code="fr-FR",
    )

    streaming_config = mediatranslation.StreamingTranslateSpeechConfig(
        audio_config=audio_config, single_utterance=True
    )

    def request_generator(config, audio_file_path):
        # The first request contains the configuration.
        # Note that audio_content is explicitly set to None.
        yield mediatranslation.StreamingTranslateSpeechRequest(streaming_config=config)

        with open(audio_file_path, "rb") as audio:
            while True:
                chunk = audio.read(4096)
                if not chunk:
                    break
                yield mediatranslation.StreamingTranslateSpeechRequest(
                    audio_content=chunk
                )

    requests = request_generator(streaming_config, file_path)
    responses = client.streaming_translate_speech(requests)

    for response in responses:
        # Once the transcription settles, the response contains the
        # is_final result. The other results will be for subsequent portions of
        # the audio.
        print(f"Response: {response}")
        result = response.result
        translation = result.text_translation_result.translation

        if result.text_translation_result.is_final:
            print(f"\nFinal translation: {translation}")
            break

        print(f"\nPartial translation: {translation}")

Menerjemahkan ucapan dari mikrofon

Java

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub


import com.google.api.gax.rpc.ClientStream;
import com.google.api.gax.rpc.ResponseObserver;
import com.google.api.gax.rpc.StreamController;
import com.google.cloud.mediatranslation.v1beta1.SpeechTranslationServiceClient;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechConfig;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechRequest;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechResponse;
import com.google.cloud.mediatranslation.v1beta1.StreamingTranslateSpeechResult;
import com.google.cloud.mediatranslation.v1beta1.TranslateSpeechConfig;
import com.google.protobuf.ByteString;
import java.io.IOException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.TargetDataLine;

public class TranslateFromMic {

  public static void main(String[] args) throws IOException, LineUnavailableException {
    translateFromMic();
  }

  public static void translateFromMic() throws IOException, LineUnavailableException {

    ResponseObserver<StreamingTranslateSpeechResponse> responseObserver = null;

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (SpeechTranslationServiceClient client = SpeechTranslationServiceClient.create()) {
      responseObserver =
          new ResponseObserver<StreamingTranslateSpeechResponse>() {

            @Override
            public void onStart(StreamController controller) {}

            @Override
            public void onResponse(StreamingTranslateSpeechResponse response) {
              StreamingTranslateSpeechResult res = response.getResult();
              String translation = res.getTextTranslationResult().getTranslation();

              if (res.getTextTranslationResult().getIsFinal()) {
                System.out.println(String.format("\nFinal translation: %s", translation));
              } else {
                System.out.println(String.format("\nPartial translation: %s", translation));
              }
            }

            @Override
            public void onComplete() {}

            public void onError(Throwable t) {
              System.out.println(t);
            }
          };

      ClientStream<StreamingTranslateSpeechRequest> clientStream =
          client.streamingTranslateSpeechCallable().splitCall(responseObserver);

      TranslateSpeechConfig audioConfig =
          TranslateSpeechConfig.newBuilder()
              .setAudioEncoding("linear16")
              .setSourceLanguageCode("en-US")
              .setTargetLanguageCode("es-ES")
              .setSampleRateHertz(16000)
              .build();

      StreamingTranslateSpeechConfig streamingRecognitionConfig =
          StreamingTranslateSpeechConfig.newBuilder().setAudioConfig(audioConfig).build();

      StreamingTranslateSpeechRequest request =
          StreamingTranslateSpeechRequest.newBuilder()
              .setStreamingConfig(streamingRecognitionConfig)
              .build(); // The first request in a streaming call has to be a config

      clientStream.send(request);
      // SampleRate:16000Hz, SampleSizeInBits: 16, Number of channels: 1, Signed: true,
      // bigEndian: false
      AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
      DataLine.Info targetInfo =
          new DataLine.Info(
              TargetDataLine.class,
              audioFormat); // Set the system information to read from the microphone audio stream

      if (!AudioSystem.isLineSupported(targetInfo)) {
        System.out.println("Microphone not supported");
        System.exit(0);
      }
      // Target data line captures the audio stream the microphone produces.
      TargetDataLine targetDataLine = (TargetDataLine) AudioSystem.getLine(targetInfo);
      targetDataLine.open(audioFormat);
      targetDataLine.start();
      System.out.println("Start speaking... Press Ctrl-C to stop");
      long startTime = System.currentTimeMillis();
      // Audio Input Stream
      AudioInputStream audio = new AudioInputStream(targetDataLine);

      while (true) {
        byte[] data = new byte[6400];
        audio.read(data);
        request =
            StreamingTranslateSpeechRequest.newBuilder()
                .setAudioContent(ByteString.copyFrom(data))
                .build();
        clientStream.send(request);
      }
    }
  }
}

Node.js

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub


// Allow user input from terminal
const readline = require('readline');

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
});

function doTranslationLoop() {
  rl.question("Press any key to translate or 'q' to quit: ", answer => {
    if (answer.toLowerCase() === 'q') {
      rl.close();
    } else {
      translateFromMicrophone();
    }
  });
}

// Node-Record-lpcm16
const recorder = require('node-record-lpcm16');

// Imports the Cloud Media Translation client library
const {
  SpeechTranslationServiceClient,
} = require('@google-cloud/media-translation');

// Creates a client
const client = new SpeechTranslationServiceClient();

function translateFromMicrophone() {
  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  //const encoding = 'linear16';
  //const sampleRateHertz = 16000;
  //const sourceLanguage = 'Language to translate from, as BCP-47 locale';
  //const targetLanguage = 'Language to translate to, as BCP-47 locale';
  console.log('Begin speaking ...');

  const config = {
    audioConfig: {
      audioEncoding: encoding,
      sourceLanguageCode: sourceLanguage,
      targetLanguageCode: targetLanguage,
    },
    singleUtterance: true,
  };

  // First request needs to have only a streaming config, no data.
  const initialRequest = {
    streamingConfig: config,
    audioContent: null,
  };

  let currentTranslation = '';
  let currentRecognition = '';
  // Create a recognize stream
  const stream = client
    .streamingTranslateSpeech()
    .on('error', e => {
      if (e.code && e.code === 4) {
        console.log('Streaming translation reached its deadline.');
      } else {
        console.log(e);
      }
    })
    .on('data', response => {
      const {result, speechEventType} = response;
      if (speechEventType === 'END_OF_SINGLE_UTTERANCE') {
        console.log(`\nFinal translation: ${currentTranslation}`);
        console.log(`Final recognition result: ${currentRecognition}`);

        stream.destroy();
        recording.stop();
      } else {
        currentTranslation = result.textTranslationResult.translation;
        currentRecognition = result.recognitionResult;
        console.log(`\nPartial translation: ${currentTranslation}`);
        console.log(`Partial recognition result: ${currentRecognition}`);
      }
    });

  let isFirst = true;
  // Start recording and send microphone input to the Media Translation API
  const recording = recorder.record({
    sampleRateHertz: sampleRateHertz,
    threshold: 0, //silence threshold
    recordProgram: 'rec',
    silence: '5.0', //seconds of silence before ending
  });
  recording
    .stream()
    .on('data', chunk => {
      if (isFirst) {
        stream.write(initialRequest);
        isFirst = false;
      }
      const request = {
        streamingConfig: config,
        audioContent: chunk.toString('base64'),
      };
      if (!stream.destroyed) {
        stream.write(request);
      }
    })
    .on('close', () => {
      doTranslationLoop();
    });
}

doTranslationLoop();

Python

Untuk mengautentikasi ke Media Translation, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, baca Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Lihat di GitHub


import itertools
import queue

from google.cloud import mediatranslation as media
import pyaudio

# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10)  # 100ms
SpeechEventType = media.StreamingTranslateSpeechResponse.SpeechEventType

class MicrophoneStream:
    """Opens a recording stream as a generator yielding the audio chunks."""

    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk

        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True

    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )

        self.closed = False

        return self

    def __exit__(self, type=None, value=None, traceback=None):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()

    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue

    def exit(self):
        self.__exit__()

    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]

            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            yield b"".join(data)

def listen_print_loop(responses):
    """Iterates through server responses and prints them.

    The responses passed is a generator that will block until a response
    is provided by the server.
    """
    translation = ""
    for response in responses:
        # Once the transcription settles, the response contains the
        # END_OF_SINGLE_UTTERANCE event.
        if response.speech_event_type == SpeechEventType.END_OF_SINGLE_UTTERANCE:
            print(f"\nFinal translation: {translation}")
            return 0

        result = response.result
        translation = result.text_translation_result.translation

        print(f"\nPartial translation: {translation}")

def do_translation_loop():
    print("Begin speaking...")

    client = media.SpeechTranslationServiceClient()

    speech_config = media.TranslateSpeechConfig(
        audio_encoding="linear16",
        source_language_code="en-US",
        target_language_code="es-ES",
    )

    config = media.StreamingTranslateSpeechConfig(
        audio_config=speech_config, single_utterance=True
    )

    # The first request contains the configuration.
    # Note that audio_content is explicitly set to None.
    first_request = media.StreamingTranslateSpeechRequest(streaming_config=config)

    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        mic_requests = (
            media.StreamingTranslateSpeechRequest(audio_content=content)
            for content in audio_generator
        )

        requests = itertools.chain(iter([first_request]), mic_requests)

        responses = client.streaming_translate_speech(requests)

        # Print the translation responses as they arrive
        result = listen_print_loop(responses)
        if result == 0:
            stream.exit()

def main():
    while True:
        print()
        option = input("Press any key to translate or 'q' to quit: ")

        if option.lower() == "q":
            break

        do_translation_loop()

if __name__ == "__main__":
    main()