Try Speech-to-Text

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Speech-to-Text pre-trained API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Follow these steps before trying Speech-to-Text:

Set up a project using the GDC console to group the Vertex AI services. For information about creating and using projects, see Create a project.
Ask your Project IAM Admin to grant you the AI Speech Developer (ai-speech-developer) role in your project namespace.
Enable the Speech-to-Text pre-trained API.
Download the gdcloud command-line interface (CLI).

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-speech-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-speech-developer

Set your environment variables

Before running the Speech-to-Text pre-trained service, set your environment variable.

  export GOOGLE_APPLICATION_CREDENTIALS="SERVICE_KEY".json

Authenticate the gdcloud CLI

You must get a token to authenticate the gdcloud CLI before sending requests to the Speech-to-Text pre-trained service. Follow these steps:

Install the google-auth client library.
```
pip install google-auth
```

Save the following code to a Python script, and update the ENDPOINT to the Speech-to-Text endpoint. For more information, see View service statuses and endpoints.

import google.auth
from google.auth.transport import requests

api_endpoint = "https://ENDPOINT"

creds, project_id = google.auth.default()
creds = creds.with_gdch_audience(api_endpoint)

def test_get_token():
  req = requests.Request()
  creds.refresh(req)
  print(creds.token)

if __name__=="__main__":
  test_get_token()

Run the script to fetch the token.

You must add the fetched token to the header of the grpcurl request as in the following example:
```
-H "Authorization: Bearer TOKEN"
```

Save the following recognize_request.json file:

cat <<- EOF > recognize_request.json
{
  "config": {
    "encoding": "LINEAR16",
    "sample_rate_hertz": 16000,
    "audio_channel_count": 1,
    "language_code": "en-US"
  },
  "audio": {
    "content": "CONTENT"
  }
}
EOF

Replace CONTENT with the audio content.

If you don't have grpcurl installed, download and install it from a resource outside of Distributed Cloud (https://github.com/fullstorydev/grpcurl#from-source).

Important: Use version 1.8.7 or earlier of grpcurl. Later versions fail when calling pre-trained endpoints.
Make the grpcurl request:
```
grpcurl -vv -H "Authorization: Bearer TOKEN" -authority ENDPOINT -d @ ENDPOINT:443 google.cloud.speech.v1.Speech.Recognize < recognize_request.json
```
Replace the following:
- TOKEN: the fetched token on the header of the grpcurl request.
- ENDPOINT: the Speech-to-Text endpoint that you use for your organization.

Run the Speech-to-Text pre-trained API sample script

This example shows you how to interact with a Speech-to-Text pre-trained API.

Check whether there is a client library installed.
```
  pip freeze | grep speech
  # output example: google-cloud-speech==2.15.0
```
If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library using the following command:
```
  pip uninstall google-cloud-speech
```
Specify the console endpoint and the client library for Speech-to-Text (provided in the example).
```
   wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-speech
```
Note: If the error message, "x509: certificate signed by unknown authority", is displayed, your workstation doesn't trust the CA certificate used in Distributed Cloud. Follow your organization's procedure to check the trusted certification store for your workstation.
Warning: Using --login-config-cert with an unverified certificate makes your workstation vulnerable to man-in-the-middle attacks. Ensure that you rely only on your workstation's trust store instead of trusting a CA certificate from unknown sources.
Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.
```
tar -xvzf CLIENT_LIBRARY

pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
```
Use the Speech-to-Text client library script to generate the token, and make requests to the OCR service.

Set up your environment variable.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""SERVICE_KEY".json"

Speech-to-Text sample

Replace the ENDPOINT with the Speech-to-Text endpoint that you use for your organization.

import base64

from google.cloud import speech_v1
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT:443"
api_endpoint="ENDPOINT:443"

def get_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  return speech_v1.SpeechClient(credentials=creds, client_options=opts)
def main():
  creds = None
  try:
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def speech_func(creds):
  tc = get_client(creds)
  content="CONTENT"

  audio = speech_v1.RecognitionAudio()
  audio.content = base64.standard_b64decode(content)
  config = speech_v1.RecognitionConfig()
  config.encoding= speech_v1.RecognitionConfig.AudioEncoding.LINEAR16
  config.sample_rate_hertz=16000
  config.language_code="en-US"
  config.audio_channel_count=1

  resp = tc.recognize(config=config, audio=audio)
  print(resp)

if __name__=="__main__":
  creds = main()
  speech_func(creds)

What's next

Learn more about how to Transcribe audio.