Transcribe speech to text by using the gcloud CLI

This page shows you how to send a speech recognition request to Speech-to-Text using the gcloud tool from the command line.

Speech-to-Text enables easy integration of Google speech recognition technologies into developer applications. You can send audio data to the Speech-to-Text API, which then returns a text transcription of that audio file. For more information about the service, see Speech-to-Text basics.

Before you begin

Before you can send a request to the Speech-to-Text API, you must have completed the following actions. See the before you begin page for details.

  • Enable Speech-to-Text on a GCP project.
    1. Make sure billing is enabled for Speech-to-Text.
  • Install the Google Cloud CLI, then initialize it by running the following command:

    gcloud init
  • (Optional) Create a new Google Cloud Storage bucket to store your audio data.

Make an audio transcription request

Now you can use Speech-to-Text to transcribe an audio file to text. Use the following code sample to send a recognize request to the Speech-to-Text API.

Open the command line shell and run the following command.

gcloud ml speech recognize gs://cloud-samples-tests/speech/brooklyn.flac \
    --language-code=en-US

This command requests that Speech-to-Text transcribe the audio contained in a FLAC hosted at a publicly accessible location.

If the request is successful, the server returns a response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

Congratulations! You've sent your first request to Speech-to-Text.

If you receive an error or an empty response from Speech-to-Text, take a look at the troubleshooting and error mitigation steps.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

What's next