Evaluate models

Use the benchmarking functionality of the Cloud Speech-to-Text Console to measure the accuracy of any of the transcription models used in the Speech-to-Text V2 API.

Cloud Speech-to-Text Console provides visual benchmarking for pre-trained and Custom Speech-to-Text models. You can inspect the recognition quality by comparing Word-Error-Rate (WER) evaluation metrics across multiple transcription models to help you decide which model best fits your application.

Before you begin

Ensure you have signed up for a Google Cloud account, created a project, trained a custom speech model, and deployed using an endpoint.

Create a ground-truth dataset

To create a custom benchmarking dataset, gather audio samples that accurately reflect the type of traffic the transcription model will encounter in a production environment. The aggregate duration of these audio files should ideally span a minimum of 30 minutes and not exceed 10 hours. To assemble the dataset, you will need to:

Create a directory in a Cloud Storage bucket of your choice to store the audio and text files for the dataset.
For every audio-file in the dataset, create reasonably accurate transcriptions. For each audio file (such as example_audio_1.wav), a corresponding ground-truth text file (example_audio_1.txt) must be created. This service uses these audio-text pairings in a Cloud Storage bucket to assemble the dataset.

Benchmark the model

Using the Custom Speech-to-Text model and your benchmarking dataset to assess the accuracy of your model, follow the Measure and improve accuracy guide.

Evaluate models Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Create a ground-truth dataset

Benchmark the model

Evaluate models