Transcribe speech to text by using the Google Cloud console

This quickstart introduces you to the Cloud Speech-to-Text Console. In this quickstart, you will create and refine a transcription and learn how to use this configuration with the Speech-to-Text API for your own applications.

To learn how to send requests and receive responses using the REST API instead of the Console, see the before you begin page.

Before you begin

Before you can begin using the Speech-to-Text Console, you must enable the API in the Google Cloud Platform Console. The steps below walk you through the following actions:

  • Enable Speech-to-Text on a project.
  • Make sure billing is enabled for Speech-to-Text.

Set up your Google Cloud project

  1. Sign in to Google Cloud console

  2. Go to the project selector page

    You can either choose an existing project or create a new one. For more details about creating a project, see Google Cloud Platform documentation.

  3. If you create a new project, you will be prompted to link a billing account to this project. If you are using a pre-existing project, make sure that you have billing enabled.

    Learn how to confirm that billing is enabled for your project

  4. Once you have selected a project and linked it to a billing account, you can enable the Speech-to-Text API. Go to the Search products and resources bar at the top of the page and type in "speech".

  5. Select the Cloud Speech-to-Text API from the list of results.

  6. To try Speech-to-Text without linking it to your project, choose the TRY THIS API option. To enable the Speech-to-Text API for use with your project, click ENABLE.

Create a transcription

Use the Google Cloud console to create a new transcription:

Audio configuration

  1. Open the Speech-to-Text overview.

    Screenshot of the Speech-to-text Overview page.

  2. Click Create transcription.

    • If this is your first time using the console, you will be asked to choose where in Cloud Storage to store your configurations and transcriptions.
      Screenshot of the Speech-to-text Create Transcription page.
  3. In the Create transcription page, Upload a source audio file. You can choose a file that is already saved in Cloud Storage or upload a new one to your specified Cloud Storage destination.

  4. Select the uploaded audio file's encoding type.

  5. Specify its sample rate.

  6. Click Continue. You will be taken to Transcription options.

Transcription options

  1. Select the language code of your source audio. This is the language being spoken in the recording.

  2. Choose the transcription model you would like to use on the file. The Default option is pre-selected and, generally, no change is needed, but matching the model to the type of audio may result in higher accuracy. Note that model costs vary.

    Screenshot of the Speech-to-text Create Transcription page.

  3. Click Continue. You will be taken to Model adaptation.

Model adaptation (optional)

If your source audio contains things like rare words, proper names, or proprietary terms and you experience problems with recognition, model adaptation can help.

  1. Check Turn on model adaptation.

  2. Choose One-time adaptation resource.

  3. Add relevant phrases and give them a boost value.

    Screenshot of the Speech-to-text Create Transcription page.

  4. In the left column, click Submit to create your transcription.

Review your transcription

Depending on the size of your audio file, a transcription may take from minutes to hours to create. Once your transcription has been created, it's ready for review. Sorting the table by timestamp can help you easily locate your recent transcriptions.

  1. Click on the Name of the transcription you would like to review.

    Screenshot of the Speech-to-text Transcription List page.
  2. Compare the Transcription text to the audio file

    Screenshot of the Speech-to-text Transcription List page.
  3. If you would like to make changes, click Reuse configuration. This will bring you to the Create transcription flow with the same options pre-selected, allowing you to change a few things, create a new transcription, and compare the results.

What's next