Evaluate models

Use the benchmarking functionality of the Cloud Speech-to-Text Console to measure the accuracy of any of the transcription models used in the Speech-to-Text V2 API.

Cloud Speech-to-Text Console provides visual benchmarking for pre-trained and Custom Speech-to-Text models. You can inspect the recognition quality by comparing Word-Error-Rate (WER) evaluation metrics across multiple transcription models to help you decide which model best fits your application.

Before you begin

Ensure you have signed up for a Google Cloud account, created a project, trained a custom speech model, and deployed using an endpoint.

Create a ground-truth dataset

To create a custom benchmarking dataset, gather audio samples that accurately reflect the type of traffic the transcription model will encounter in a production environment. The aggregate duration of these audio files should ideally span a minimum of 30 minutes and not exceed 10 hours. To assemble the dataset, you will need to:

  1. Create a directory in a Cloud Storage bucket of your choice to store the audio and text files for the dataset.
  2. For every audio-file in the dataset, create reasonably accurate transcriptions. For each audio file (such as example_audio_1.wav), a corresponding ground-truth text file (example_audio_1.txt) must be created. This service uses these audio-text pairings in a Cloud Storage bucket to assemble the dataset.

Benchmark the model

Using the Custom Speech-to-Text model and your benchmarking dataset to assess the accuracy of your model, follow the Measure and improve accuracy guide.