Try out Speech-to-Text

This guide walks you through the process of running a Speech-to-Text test using Google's Vertex AI Speech service.

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

Create a python file speech-to-text-test.py. Replace the image_uri_to_test value with the URI of a source image, as shown:

from google.cloud import speech

def transcribe_gcs_audio(gcs_uri: str) -> speech.RecognizeResponse:
    client = speech.SpeechClient()

    audio = speech.RecognitionAudio(uri=gcs_uri)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code="en-US", # Specify the language code (e.g., "en-US" for US English)
        # You can add more features here, e.g.:
        # enable_automatic_punctuation=True,
        # model="default" # or "latest_long", "phone_call", "video", "chirp" (v2 API)
    )

    # Performs synchronous speech recognition on the audio file
    response = client.recognize(config=config, audio=audio)

    # Print the transcription
    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
        if result.alternatives[0].confidence:
            print(f"Confidence: {result.alternatives[0].confidence:.2f}")

    return response

if __name__ == "__main__":
    # Replace with the URI of your audio file in Google Cloud Storage
    audio_file_uri = "AUDIO_FILE_URI"

    print(f"Transcribing audio from: {audio_file_uri}")
    transcribe_gcs_audio(audio_file_uri)

Replace the following:

AUDIO_FILE_URI: the URI of an audio file "gs://your-bucket/your-image.png"

Create a Dockerfile:

ROM python:3.9-slim

WORKDIR /app

COPY speech-to-text-test.py /app/

# Install 'requests' for HTTP calls
RUN pip install --no-cache-dir requests

CMD ["python", "speech-to-text-test.py"]

Build the Docker image for the Speech-to-Text application:
```
docker build -t speech-to-text-app .
```
Follow instructions at Configure Docker to:
1. Configure Docker,
2. Create a secret, and
3. Upload the image to HaaS.
Sign in to the user cluster and generate its kubeconfig file with a user identity. Make sure that you set the kubeconfig path as an environment variable:
```
export KUBECONFIG=${CLUSTER_KUBECONFIG_PATH}
```
Create a Kubernetes secret by running the following command in your terminal, pasting your API key:
```
kubectl create secret generic gcp-api-key-secret \
  --from-literal=GCP_API_KEY='PASTE_YOUR_API_KEY_HERE'
```
This command creates a secret named gcp-api-key-secret with a key GCP_API_KEY.

Apply the Kubernetes manifest:

apiVersion: batch/v1
kind: Job
metadata:
  name: speech-to-text-test-job
spec:
  template:
    spec:
      containers:
      - name: speech-to-text-test-container
        image: HARBOR_INSTANCE_URL/HARBOR_PROJECT/speech-to-text-app:latest # Your image path
        # Mount the API key from the secret into the container
        # as an environment variable named GCP_API_KEY.
        imagePullSecrets:
        - name: SECRET
        envFrom:
        - secretRef:
            name: gcp-api-key-secret
      restartPolicy: Never
  backoffLimit: 4

Replace the following:

HARBOR_INSTANCE_URL: the Harbor instance URL.
HARBOR_PROJECT: the Harbor project.
SECRET: the name of the secret created to store docker credentials.

Check the job status:

kubectl get jobs/speech-to-text-test-job
# It will show 0/1 completions, then 1/1 after it succeeds

After the job has completed, you can view the output in the pod logs:
```
kubectl logs -l job-name=speech-to-text-test-job
```