Create a python file speech-to-text-test.py. Replace the image_uri_to_test value
with the URI of a source image, as shown:
fromgoogle.cloudimportspeechdeftranscribe_gcs_audio(gcs_uri:str)-> speech.RecognizeResponse:client=speech.SpeechClient()audio=speech.RecognitionAudio(uri=gcs_uri)config=speech.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.FLAC,sample_rate_hertz=16000,language_code="en-US",# Specify the language code (e.g., "en-US" for US English)# You can add more features here, e.g.:# enable_automatic_punctuation=True,# model="default" # or "latest_long", "phone_call", "video", "chirp" (v2 API))# Performs synchronous speech recognition on the audio fileresponse=client.recognize(config=config,audio=audio)# Print the transcriptionforresultinresponse.results:print(f"Transcript: {result.alternatives[0].transcript}")ifresult.alternatives[0].confidence:print(f"Confidence: {result.alternatives[0].confidence:.2f}")returnresponseif__name__=="__main__":# Replace with the URI of your audio file in Google Cloud Storageaudio_file_uri="AUDIO_FILE_URI"print(f"Transcribing audio from: {audio_file_uri}")transcribe_gcs_audio(audio_file_uri)
Replace the following:
AUDIO_FILE_URI: the URI of an audio file
"gs://your-bucket/your-image.png"
Create a Dockerfile:
ROMpython:3.9-slim
WORKDIR/appCOPYspeech-to-text-test.py/app/
# Install 'requests' for HTTP callsRUNpipinstall--no-cache-dirrequests
CMD["python","speech-to-text-test.py"]
Build the Docker image for the Speech-to-Text application:
Sign in to the user cluster and generate its kubeconfig file with a
user identity. Make sure that you set the kubeconfig path as an environment
variable:
exportKUBECONFIG=${CLUSTER_KUBECONFIG_PATH}
Create a Kubernetes secret by running the following command in your
terminal, pasting your API key:
This command creates a secret named gcp-api-key-secret with a key
GCP_API_KEY.
Apply the Kubernetes manifest:
apiVersion:batch/v1kind:Jobmetadata:name:speech-to-text-test-jobspec:template:spec:containers:-name:speech-to-text-test-containerimage:HARBOR_INSTANCE_URL/HARBOR_PROJECT/speech-to-text-app:latest# Your image path# Mount the API key from the secret into the container# as an environment variable named GCP_API_KEY.imagePullSecrets:-name:SECRETenvFrom:-secretRef:name:gcp-api-key-secretrestartPolicy:NeverbackoffLimit:4
Replace the following:
HARBOR_INSTANCE_URL: the Harbor instance URL.
HARBOR_PROJECT: the Harbor project.
SECRET: the name of the secret created to store docker credentials.
Check the job status:
kubectlgetjobs/speech-to-text-test-job
# It will show 0/1 completions, then 1/1 after it succeeds
After the job has completed, you can view the output in the pod logs:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-03 UTC."],[],[],null,["# Try out Speech-to-Text\n\nThis guide walks you through the process of running a Speech-to-Text test using Google's Vertex AI Speech service.\n\nBefore trying this sample, follow the Python setup instructions in the\n[Vertex AI quickstart using client libraries](/vertex-ai/docs/start/client-libraries).\nFor more information, see the\n[Vertex AI Python API reference documentation](/python/docs/reference/aiplatform/latest).\n\n1. Create a python file `speech-to-text-test.py`. Replace the `image_uri_to_test` value\n with the URI of a source image, as shown:\n\n from google.cloud import speech\n\n def transcribe_gcs_audio(gcs_uri: str) -\u003e speech.RecognizeResponse:\n client = speech.SpeechClient()\n\n audio = speech.RecognitionAudio(uri=gcs_uri)\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.FLAC,\n sample_rate_hertz=16000,\n language_code=\"en-US\", # Specify the language code (e.g., \"en-US\" for US English)\n # You can add more features here, e.g.:\n # enable_automatic_punctuation=True,\n # model=\"default\" # or \"latest_long\", \"phone_call\", \"video\", \"chirp\" (v2 API)\n )\n\n # Performs synchronous speech recognition on the audio file\n response = client.recognize(config=config, audio=audio)\n\n # Print the transcription\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n if result.alternatives[0].confidence:\n print(f\"Confidence: {result.alternatives[0].confidence:.2f}\")\n\n return response\n\n if __name__ == \"__main__\":\n # Replace with the URI of your audio file in Google Cloud Storage\n audio_file_uri = \"\u003cvar translate=\"no\"\u003eAUDIO_FILE_URI\u003c/var\u003e\"\n\n print(f\"Transcribing audio from: {audio_file_uri}\")\n transcribe_gcs_audio(audio_file_uri)\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eAUDIO_FILE_URI\u003c/var\u003e: the URI of an audio file \"`gs://your-bucket/your-image.png`\"\n2. Create a Dockerfile:\n\n ROM python:3.9-slim\n\n WORKDIR /app\n\n COPY speech-to-text-test.py /app/\n\n # Install 'requests' for HTTP calls\n RUN pip install --no-cache-dir requests\n\n CMD [\"python\", \"speech-to-text-test.py\"]\n\n3. Build the Docker image for the Speech-to-Text application:\n\n docker build -t speech-to-text-app .\n\n4. Follow instructions at\n [Configure Docker](/distributed-cloud/hosted/docs/latest/gdch/platform-application/deploy-container-workloads#configure-docker)\n to:\n\n 1. Configure Docker,\n 2. Create a secret, and\n 3. Upload the image to HaaS.\n5. [Sign in to the user cluster](/distributed-cloud/hosted/docs/latest/gdch/clusters#kubernetes-clusters) and generate its kubeconfig file with a\n user identity. Make sure that you set the kubeconfig path as an environment\n variable:\n\n export KUBECONFIG=${CLUSTER_KUBECONFIG_PATH}\n\n6. Create a Kubernetes secret by running the following command in your\n terminal, pasting your API key:\n\n kubectl create secret generic gcp-api-key-secret \\\n --from-literal=GCP_API_KEY='PASTE_YOUR_API_KEY_HERE'\n\n This command creates a secret named `gcp-api-key-secret` with a key\n `GCP_API_KEY`.\n7. Apply the Kubernetes manifest:\n\n apiVersion: batch/v1\n kind: Job\n metadata:\n name: speech-to-text-test-job\n spec:\n template:\n spec:\n containers:\n - name: speech-to-text-test-container\n image: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eHARBOR_INSTANCE_URL\u003c/span\u003e\u003c/var\u003e/\u003cvar translate=\"no\"\u003eHARBOR_PROJECT\u003c/var\u003e/speech-to-text-app:latest # Your image path\n # Mount the API key from the secret into the container\n # as an environment variable named GCP_API_KEY.\n imagePullSecrets:\n - name: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eSECRET\u003c/span\u003e\u003c/var\u003e\n envFrom:\n - secretRef:\n name: gcp-api-key-secret\n restartPolicy: Never\n backoffLimit: 4\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eHARBOR_INSTANCE_URL\u003c/var\u003e: the Harbor instance URL.\n - \u003cvar translate=\"no\"\u003eHARBOR_PROJECT\u003c/var\u003e: the Harbor project.\n - \u003cvar translate=\"no\"\u003eSECRET\u003c/var\u003e: the name of the secret created to store docker credentials.\n8. Check the job status:\n\n kubectl get jobs/speech-to-text-test-job\n # It will show 0/1 completions, then 1/1 after it succeeds\n\n9. After the job has completed, you can view the output in the pod logs:\n\n kubectl logs -l job-name=speech-to-text-test-job"]]