Speech-to-Text is an API that is powered by Google's artificial intelligence (AI) technology. You send your audio data to Speech-to-Text, then receive a text transcription of your audio data in response.
For more information about how to construct a Speech-to-Text request, see the requests page.
Overview
Before you can begin sending requests to Speech-to-Text, you must enable the API in the Google Cloud console. The steps on this page walk you through the following actions:
- Enable Speech-to-Text on a project.
- Make sure billing is enabled for Speech-to-Text.
- Make sure your project has at least one service account.
- Download a service account credential key.
- Set your authentication environment variable.
- (Optional) Create a new Google Cloud Storage bucket to store your audio data.
Before You Begin
There are two ways to access the service: by using the REST API, or by using the Speech-to-Text Console. We provide code samples that show you how to make a request to the REST API and receive a response. You can learn how to use these samples by following the Speech-to-Text quickstarts and how-to guides. If you prefer to use Speech-to-Text with minimal coding, you can use the Cloud Speech-to-Text Console.
This guide walks you through the steps necessary to start sending requests to the REST API. If you are new to coding, we recommend that you start with the step-by-step in-console tutorials in Google Cloud Platform before beginning this quickstart.
Set up your Google Cloud project for Speech-to-Text
Go to the project selector page
You can either choose an existing project or create a new one. For more information about creating a project, see Creating and managing projects.
If you create a new project, you will be prompted to link a billing account to this project. If you are using a pre-existing project, make sure that you have billing enabled.
Learn how to confirm that billing is enabled for your project
Once you have selected a project and linked it to a billing account, you can enable the Speech-to-Text API. Go to the Search products and resources bar at the top of the page and type in "speech". Select the Cloud Speech-to-Text API from the list of results.
To try Speech-to-Text without linking it to your project, choose the TRY THIS API option. To enable the Speech-to-Text API for use with your project, click ENABLE.
(Optional) Enable data logging. By opting in to data logging, you allow Google to record any audio data that you send to Speech-to-Text. This data is used to improve the Speech-to-Text models. Users who opt in to data logging benefit from lower pricing. See the pricing and data logging terms and conditions pages for more information.
You now must link one or more service accounts to the Speech-to-Text API. Click on the Credentials menu item on the left side of the Speech-to-Text API main page. If you do not have any service accounts associated with this project, create one by following the instructions in the creating a new service account section.
If you do have previously-created service accounts associated with this project, they will appear on this page. Make sure that you have access to a downloaded JSON key associated with the service account you'd like to use to authenticate with Speech-to-Text. Service account keys are downloadable only once, at the time they are created. If your service account has an existing key but you can't locate the downloaded
.json
file, you will need to create a new key for that service account and download its.json
file. For instructions on how to create a new key on an existing service account, follow the instructions in the creating a JSON key section.If you already have a service account and its JSON key, you can now set your authentication environment variable.
Create a service account
Create a new service account if your project doesn't already have one. You must create a service account in order to use Speech-to-Text.
In the service account name box, type a unique name for the new service account. Your input is automatically populated in the Service account ID box. The Service account description box is optional but recommended if you plan to associate multiple service accounts with your project. Enter a brief description of the service account into this box, then click CREATE AND CONTINUE.
We recommend that you assign one of the basic IAM roles to your service account. You can also assign multiple roles to a single service account if needed. See IAM roles for details on available roles and the permissions allowed to each. Click on the drop-down Select a role menu and scroll down to Basic. You can choose a role for this service account from the options that appear in the right-hand column. Click CONTINUE.
The final step allows you to optionally allow other entities (individuals, Google groups, and so on) to access your service account. If you don't need to grant additional access, you can click DONE without entering any information.
The service account is now listed on the Service Accounts page. You can change the service account's permissions, add or generate new keys, and grant access at any time.
Create a JSON key for your service account
The newly-created service account appears on the service accounts page. Create a private key that will be associated with that account. You need to use this private key during the authentication process when you send a request to Speech-to-Text. If you choose not to create a key now, you can generate a key and/or change individual user information at any time by accessing the service account through the IAM & Admin -> Service Accounts option in the main navigation menu.
To create a key, click on the service account and select the KEYS tab. Click ADD KEY -> Create new key. We recommend that you create a key in JSON format.
A new key in the format of your choice is automatically downloaded. Store this file in a safe location and make a note of the file path. You will need to point the GOOGLE_APPLICATION_CREDENTIALS environment variable to this file when you go through the authentication process at the beginning of each new Speech-to-Text session. This is an essential step for authenticating requests to Speech-to-Text. The key's unique ID appears next to the name of the service account.
Set your authentication environment variable
In order to set your GOOGLE_APPLICATION_CREDENTIALS, you must have a service account associated with your project and have access to the service account's JSON key.
Provide authentication credentials to your application code by setting the
environment variable GOOGLE_APPLICATION_CREDENTIALS
. This
variable applies only to your current shell session. If you want the variable
to apply to future shell sessions, set the variable in your shell startup file,
for example in the ~/.bashrc
or ~/.profile
file.
Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
For example:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
Windows
For PowerShell:
$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
For example:
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"
For command prompt:
set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH
Replace KEY_PATH
with the path of the JSON file that contains your credentials.
For more information, see the Google Cloud Platform authentication documentation.
Optional: Create a Cloud Storage bucket
If you intend to transcribe audio longer than 60 seconds or with a file size larger than 10 MB, you must store the audio data in a Cloud Storage bucket before you can transcribe it using Speech-to-Text. The following steps walk you through the process of creating a new bucket.
- For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
-
For Choose where to store your data, do the following:
- Select a Location type option.
- Select a Location option.
- For Choose a default storage class for your data, select a storage class.
- For Choose how to control access to objects, select an Access control option.
- For Advanced settings (optional), specify an encryption method, a retention policy, or bucket labels.
Disable the Speech-to-Text API
Complete the following steps if you no longer need to use the Speech-to-Text API in the future.
- Navigate to your Google Cloud dashboard and click on the Go to APIs overview link in the APIs box.
- Select Cloud Speech-to-Text API.
- Click the DISABLE API button at the top of the Cloud Speech-to-Text API page.
What's next
Learn how to send a transcription request to the Speech-to-Text API using client libraries, gcloud, the command line, or the Speech-to-Text UI.