Try Speech-to-Text

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Speech-to-Text pre-trained API.

Before you begin

  1. Enable the Speech-to-Text pre-trained API.

  2. Download the gdcloud command-line interface (CLI).

Set up your project

Set up a project using the console to group the Vertex AI services. For information about creating and using projects, see Creating a project.

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-speech-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-speech-developer

Set your environment variables

Before running the Speech-to-Text pre-trained service, set your environment variable.


Authenticate the CLI

You must get a token to authenticate the CLI before sending requests to the Speech-to-Text pre-trained service. Follow these steps:

  1. Install the google-auth client library.

    pip install google-auth
  2. Save the following code to a Python script, and update the ENDPOINT to the Speech-to-Text endpoint. Run the script to fetch the token. For more information, see View service statuses and endpoints.

    import google.auth
    from google.auth.transport import requests
    api_endpoint = "https://ENDPOINT"
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(api_endpoint)
    def test_get_token():
      req = requests.Request()
    if __name__=="__main__":
  3. Save the following recognize_request.json file:

    cat <<- EOF > recognize_request.json
      "config": {
        "encoding": "LINEAR16",
        "sample_rate_hertz": 16000,
        "audio_channel_count": 1,
        "language_code": "en-US"
      "audio": {
        "content": "CONTENT"

    Replace CONTENT with the audio content.

  4. If you don't have grpcurl installed, download and install it from a resource outside of GDCH (

  5. Make the grpcurl request:

    grpcurl -vv -insecure -H "Authorization: Bearer TOKEN" -authority ENPOINT.GDCH_URL -d @ ENPOINT.GDCH_URL:443 < recognize_request.json

    Replace the following:

    • TOKEN: the fetched token on the header of the grpcurl request.
    • ENPOINT: the Speech-to-Text endpoint that you use for your organization.
    • GDCH_URL: the URL of your organization in GDCH, for example, org-1.zone1.gdch.test.

Run the Speech-to-Text pre-trained API sample script

This example shows you how to interact with a Speech-to-Text pre-trained API.

  1. Check whether there is a client library installed.

      pip freeze | grep speech
      # output example: google-cloud-speech==2.15.0

    If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library using the following command:

      pip uninstall google-cloud-speech
  2. Specify the console endpoint and the client library for Speech-to-Text (provided in the example).

       wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-speech
  3. Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.

    tar -xvzf CLIENT_LIBRARY
    pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
  4. Use the Speech-to-Text client library script to generate the token, and make requests to the OCR service.

  5. Set up your environment variable.


Speech-to-Text sample

Replace the ENDPOINT with the Speech-to-Text endpoint that you use for your organization.

import base64

from import speech_v1
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT"

def get_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  return speech_v1.SpeechClient(credentials=creds, client_options=opts)
def main():
  creds = None
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    print("Got token: ")
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def speech_func(creds):
  tc = get_client(creds)

  audio = speech_v1.RecognitionAudio()
  audio.content = base64.standard_b64decode(content)
  config = speech_v1.RecognitionConfig()
  config.encoding= speech_v1.RecognitionConfig.AudioEncoding.LINEAR16

  resp = tc.recognize(config=config, audio=audio)

if __name__=="__main__":
  creds = main()

What's next