This page helps developers set up a project to use the Speech-to-Text service. This process includes creating a project, enabling the Speech-to-Text API, installing client libraries, defining environment variables, and authenticating your credentials. If you are new to Vertex AI, learn more about speech recognition features.
You set up a speech recognition project using the GDC console and gdcloud CLI as follows:
- GDC console: Enable the Speech-to-Text API and view the service status and endpoint.
- The gdcloud CLI: Configure service accounts to interact with the Speech-to-Text API, install client libraries, and authenticate API requests.
Create a project
Creating a speech recognition project within your Distributed Cloud resource hierarchy organizes your Speech-to-Text resources, which include collaborators, enabled APIs, monitoring tools, billing information, authentication credentials, and access controls.
To create your project, see Set up a project for Vertex AI. You need your project ID when making API calls.
Request developer permissions
You must have the AI Speech Developer role in your project to access speech recognition features and generate an API token for request authentication and authorization.
Ask your Project IAM Admin to grant the AI Speech Developer
(ai-speech-developer
) role to your user or service account
within your project namespace. For information about this role, see
Prepare IAM permissions.
Enable the Speech-to-Text API
You must enable the Speech-to-Text pre-trained API for your project. If enabled, you can view the service status and endpoint for the Speech-to-Text pre-trained API.
Install client libraries
Client libraries are available for the Python programming language. We recommend using these client libraries to make calls to the Speech-to-Text API because they make it easier to access APIs.
Install the Speech-to-Text client library and follow these steps to ensure you have the correct version:
Check if the Speech-to-Text client library is installed and obtain the version number:
pip freeze | grep speech
If the client library is already installed, you obtain an output similar to the following example:
google-cloud-speech==2.15.0
The version number you obtain must match the client library at the following endpoint:
https://GDC_URL/.well-known/static/client-libraries
Replace
GDC_URL
with the URL of your organization in GDC.If the version numbers don't match, uninstall the client library:
pip uninstall google-cloud-speech
If you uninstalled the Speech-to-Text client library, you must reinstall it by specifying the filename corresponding to your operating system.
Set your environment variables
After installing the Speech-to-Text client library, you can interact with the API from a Python script.
If you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the Python script to access values such as the service account keys when running.
Follow these steps to set required environment variables on a Python script:
Create a JupyterLab notebook to interact with the Speech-to-Text pre-trained API.
Create a Python script on the JupyterLab notebook.
Add the following code to the Python script:
import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
Replace
APPLICATION_DEFAULT_CREDENTIALS_FILENAME
with the name of the JSON file that contains the service account keys you created in the project, such asmy-service-key.json
.Save the Python script with a name, such as
speech.py
.Run the Python script to set the environment variables:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, such asspeech.py
.
Set up authentication
Before you can start using the Speech-to-Text API, you must authenticate your client credentials and request account access to your project resources. For more information, see Authenticate API requests.