Set up a speech recognition project

This page helps developers set up a project to use the Speech-to-Text service. This process includes creating a project, enabling the Speech-to-Text API, installing client libraries, defining environment variables, and authenticating your credentials. If you are new to Vertex AI, learn more about speech recognition features.

You set up a speech recognition project using the GDC console and gdcloud CLI as follows:

GDC console: Enable the Speech-to-Text API and view the service status and endpoint.
The gdcloud CLI: Configure service accounts to interact with the Speech-to-Text API, install client libraries, and authenticate API requests.

Create a project

Creating a speech recognition project within your Distributed Cloud resource hierarchy organizes your Speech-to-Text resources, which include collaborators, enabled APIs, monitoring tools, billing information, authentication credentials, and access controls.

To create your project, see Set up a project for Vertex AI. You need your project ID when making API calls.

Request developer permissions

You must have the AI Speech Developer role in your project to access speech recognition features and generate an API token for request authentication and authorization.

Ask your Project IAM Admin to grant the AI Speech Developer (ai-speech-developer) role to your user or service account within your project namespace. For information about this role, see Prepare IAM permissions.

Enable the Speech-to-Text API

You must enable the Speech-to-Text pre-trained API for your project. If enabled, you can view the service status and endpoint for the Speech-to-Text pre-trained API.

Install client libraries

Client libraries are available for the Python programming language. We recommend using these client libraries to make calls to the Speech-to-Text API because they make it easier to access APIs.

Install the Speech-to-Text client library and follow these steps to ensure you have the correct version:

Check if the Speech-to-Text client library is installed and obtain the version number:
```
pip freeze | grep speech
```
If the client library is already installed, you obtain an output similar to the following example:
```
google-cloud-speech==2.15.0
```
The version number you obtain must match the client library at the following endpoint:
```
https://GDC_URL/.well-known/static/client-libraries
```
Replace GDC_URL with the URL of your organization in GDC.
If the version numbers don't match, uninstall the client library:
```
pip uninstall google-cloud-speech
```
If you uninstalled the Speech-to-Text client library, you must reinstall it by specifying the filename corresponding to your operating system.

Set your environment variables

After installing the Speech-to-Text client library, you can interact with the API from a Python script.

If you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the Python script to access values such as the service account keys when running.

Follow these steps to set required environment variables on a Python script:

Create a JupyterLab notebook to interact with the Speech-to-Text pre-trained API.
Create a Python script on the JupyterLab notebook.
Add the following code to the Python script:
```
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
```
Replace APPLICATION_DEFAULT_CREDENTIALS_FILENAME with the name of the JSON file that contains the service account keys you created in the project, such as my-service-key.json.
Save the Python script with a name, such as speech.py.
Run the Python script to set the environment variables:
```
python SCRIPT_NAME
```
Replace SCRIPT_NAME with the name you gave to your Python script, such as speech.py.

Set up authentication

Before you can start using the Speech-to-Text API, you must authenticate your client credentials and request account access to your project resources. For more information, see Authenticate API requests.