In this tutorial, you create an app that performs real-time transcription of an audio stream using Speech-to-Text, Google Kubernetes Engine (GKE), and Memorystore. The app is designed to be highly available and resilient, and provides a baseline for a production transcription app. The code for the app that you create in this tutorial is in a GitHub repository.
This tutorial is a companion to the architecture for production-ready live audio transcription using Speech-to-Text guide. For a deeper discussion of the use case and design decisions, see that document.
Objectives
- Create a GKE cluster and a Memorystore instance.
- Build and deploy app microservices to the GKE cluster.
- Verify app behavior by streaming an audio file.
- Test app resilience by introducing failures.
Costs
This tutorial uses the following billable components of Google Cloud:
- Speech-to-Text
- GKE
- Memorystore
- Cloud Load Balancing
- Cloud Build
- Compute Engine network egress
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
Before you begin
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Cloud Build API and the Speech-to-Text, GKE, and Memorystore APIs.
Architecture
The following diagram describes the architecture that you deploy in this tutorial.
The architecture includes the following features:
- Three app microservices:
Ingestor
: This service consumes the source audio stream.Transcriber
: This service calls Speech-to-Text and emits transcription results.Reviewer
: This service displays the transcriptions in a web app for review.
- GKE, which hosts the app microservices in a regional GKE cluster that spans multiple zones. The app microservices are deployed across zones.
- Memorystore for Redis, which is used as fast intermediate storage. This is deployed in a high-availability configuration.
- Load balancers that expose app functionality to the internet in order to
do the following:
- Provide an IP address that the source audio stream can be directed to.
- Serve the reviewer web app.
In this tutorial, you use the following clients to test app functionality:
- An audio client script that streams an audio file to the
Ingestor
service. - A demo web client that displays the transcriptions from the
Reviewer
service.
Creating the Google Cloud infrastructure
In Cloud Shell, create a variable for your Cloud project ID:
export PROJECT_ID=project-id
Replace project-id with the ID of the Cloud project that you created or selected for this tutorial.
Set the project for your active Cloud Shell session:
gcloud config set project $PROJECT_ID
Create and launch a regional GKE cluster:
gcloud container clusters create transcription-cluster \ --release-channel=regular \ --region=us-central1 \ --node-locations=us-central1-a,us-central1-b \ --num-nodes=1 \ --machine-type=n1-highcpu-2 \ --scopes=cloud-platform \ --enable-ip-alias \ --metadata disable-legacy-endpoints=true
For this tutorial, you create resources in the
us-central1
region. A single node in two zones within the region is sufficient for the tasks in the tutorial.This task can take a few minutes to complete.
Create a standard tier Memorystore instance in the same region as the GKE cluster:
gcloud redis instances create redis-captions \ --tier=standard \ --region=us-central1 \ --zone=us-central1-a
Standard Tier instances provide high availability using replication and automatic failover. The instance consists of a primary node in the specified zone, and a replica in a different zone within the region. This task can take a few minutes to complete.
Create an environment variable to store the IP address of the Memorystore instance:
export REDIS_HOST=`gcloud redis instances describe redis-captions \ --region=us-central1 --format='value(host)'`
Deploying the microservices
In Cloud Shell, clone the GitHub repository that contains the code for the app you will deploy:
git clone https://github.com/GoogleCloudPlatform/solutions-speech-productionized-transcription
Change to the repository directory:
cd solutions-speech-productionized-transcription
Start a Cloud Build pipeline to build the Docker containers for the app microservices:
gcloud builds submit --config cloudbuild.yaml
The built containers are saved to your project's Container Registry.
Change the Kubernetes YAML config files to use your Cloud project ID:
sed -i "s/myproject/$PROJECT_ID/" k8s/*.yaml
Change the Kubernetes YAML config files to set the IP address of your Memorystore instance:
sed -i "s/redisHost=.*/redisHost=$REDIS_HOST/" k8s/*.yaml
Deploy the
Ingestor
,Transcriber
, andReviewer
microservices to the GKE cluster:kubectl apply -f \ k8s/ingestor.yaml,k8s/transcriber.yaml,k8s/reviewer.yaml
Verify that all three of the Kubernetes Deployments have been created in the GKE cluster:
kubectl get deployments
The output is similar to the following. Note that each deployment has two pods.
NAME READY UP-TO-DATE AVAILABLE AGE ingestor-deployment 2/2 2 2 13s reviewer-deployment 2/2 2 2 12s transcriber-deployment 2/2 2 2 12s
Verify that the
Ingestor
andReviewer
Kubernetes services have been created in the GKE cluster:kubectl get services
The output is similar to the following:
NAME TYPE CLUSTER-IP EXTERNAL-IP ingestor-service LoadBalancer 10.0.9.193 35.224.219.112 kubernetes ClusterIP 10.0.0.1 <none> reviewer-service LoadBalancer 10.0.9.203 35.223.138.37
Set an environment variable to the external IP address of the
Ingestor
service:export INGEST_IP=`kubectl get services ingestor-service \ -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`
Get the contents of the
Ingestor
page to check that the page is available:curl $INGEST_IP; echo
If the page is working, you see a
Hello
message.Set an environment variable with the external IP address of the
Reviewer
service, and display the variable value in the console:export REVIEWER_IP=`kubectl get services reviewer-service \ -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`; \ echo $REVIEWER_IP
Copy the
REVIEWER_IP
address that was displayed by the preceding command.On your local machine, open a browser tab to that IP address that you just copied.
You see a reviewer web page that displays the text
Live Transcription demo
. Later on, transcriptions will be written to this web page, so keep the page open.
Verifying transcription of an audio stream
In this section, you stream an audio file to the app that you just deployed and you examine the generated transcriptions. The audio is a reading of the first few sentences of the Humpty Dumpty chapter from the book Through the Looking-Glass by Lewis Carroll. For comparison against how Speech-to-Text transcribes the audio, you can see the text of the chapter on the Gutenberg.org site.
The audio client streams the audio to the Ingestor
service using the
SocketIO
library, which is an open source package that provides real-time, reliable,
bi-directional communication. Similarly, the Reviewer
service delivers the
transcriptions to the demo web client using SocketIO.
The Transcriber
service includes configuration for Speech-to-Text that
match the characteristics of the sample audio file. Specifically, the
configuration defines the audio sample rate (44100 kHz), the number of audio
channels (single channel; mono), and the input language (US English). If you
want to stream a different audio file, you might need to update the
configuration to match your input audio.
In Cloud Shell, change to the client directory:
cd client
Create and activate a new Python 3 virtual environment. The
virtualenv
utility is already installed in Cloud Shell.virtualenv -p python3 venv source venv/bin/activate
Install the Python packages that are required by the demo web client:
pip install -r requirements.txt
Run the audio client script to stream an audio file to the
Ingestor
service:python audio_client.py --targetip $INGEST_IP \ --file humptydumpty.wav
On your local machine, return to the reviewer web page.
You see the transcriptions streaming to the page.
Compare the Speech-to-Text transcriptions to the chapter text to determine how accurate the transcriptions are.
The following screenshot shows the text and an example transcription.
In Cloud Shell, press Control+C to stop the audio stream.
Testing Transcriber failover
As described in the
companion guide,
the Transcriber
service uses a leader election pattern to ensure that only a
single Transcriber
pod is connected to Speech-to-Text, and that the
remaining pods act as hot standbys for efficient failover. In this section, you
verify the failover behavior by monitoring transcription output when the leader
pod is deleted.
When a new Transcriber
pod is elected as leader, as part of the recovery
process, the new leader replays the last few seconds of the most recently
received audio. This helps minimize audio loss when the previous leader goes
offline. This approach can result in some duplicate transcribed words, because
the previous audio is replayed. In a production app, the processes that consume
the transcriptions must reconcile any duplicates.
On your local machine, refresh the browser tab to clear any existing transcriptions from the reviewer web page.
In Cloud Shell, restart the audio client:
python audio_client.py --targetip $INGEST_IP \ --file humptydumpty.wav
You see transcriptions appearing in the reviewer web page.
Open another Cloud Shell tab so that you have two Cloud Shell tabs open.
In the new Cloud Shell tab, set an environment variable to your project ID:
export PROJECT_ID=project-id
Replace project-id with the ID of the Cloud project that you created or selected for this tutorial.
Change to the repository directory:
cd solutions-speech-reliable-transcription
Delete the leader transcriber pod:
python3 deleter.py --leader --iterations 3 --delay 15
The script queries the Kubernetes control plane to get the identity of the current leader pod, and then it deletes the pod. The command deletes the leader three times, waiting 15 seconds between each iteration.
Monitor the reviewer web page.
You see a
[REPLAY]
notification in the transcription stream. This indicates that a newTranscriber
pod has been elected as leader, and that the last few seconds of audio data are being replayed. You might see some duplicated words in the transcriptions. Observe that the failover is very fast, and there is limited disruption to the transcription output.Verify that the
Transcriber
deployment still has two pods:kubectl get pods -l=app=transcriber
The output is similar to the following:
NAME READY STATUS transcriber-deployment-7f57746c7c-rjwm5 2/2 Running transcriber-deployment-7f57746c7c-t7srr 2/2 Running
Because the
Transcriber
is a Kubernetes Deployment, Kubernetes automatically creates new pods so that the specified number of pod replicas is satisfied.Return to the first Cloud Shell tab and stop the audio client by pressing Control+C.
Deleting other microservice pods
In the previous section, you verified that transcription delivery is not
disrupted when Transcriber
pods are deleted. In this section, you test app
behavior when other microservice pods are deleted.
The Ingestor
and Reviewer
pods are exposed to the internet by
Kubernetes services
of type
LoadBalancer.
Clients connect to each Kubernetes service using a stable IP address, and
Kubernetes routes the traffic to an available pod. When Ingestor
or
Reviewer
pods are deleted, Kubernetes updates the corresponding service
so that traffic is not directed to a nonexistent pod. Similarly, when
new pods are then created to satisfy the configured number of pod replicas,
the service is updated so that traffic can be sent to the new pod.
In this app, you rely on this behavior to keep traffic moving through the app.
Because both the Ingestor
and Reviewer
services have two pods each, when a
pod is removed, Kubernetes can quickly redirect traffic to another pod that's
ready and that can start processing traffic.
The audio client and demo web client both use
SocketIO
to connect with the Ingestor
and Reviewer
services. When SocketIO loses a
connection, it automatically attempts to reconnect. That way, if the current pod
is deleted, the clients reconnect to a new pod using the same service IP
address.
Delete Ingestor pods
On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.
In Cloud Shell, restart the audio client:
python audio_client.py --targetip $INGEST_IP \ --file humptydumpty.wav
You see transcriptions appearing in the reviewer page.
Switch to the other Cloud Shell tab.
Delete an
Ingestor
pod:python3 deleter.py --applabel ingestor --iterations 5 --delay 15
The script queries the Kubernetes control plane to get the names of all
Ingestor
pods, and then deletes a randomly selected pod. The command deletes a pod five times, waiting 15 seconds between each iteration.Because the script randomly selects an
Ingestor
pod to delete, the pod that the audio client is connected to is deleted only about half the time. Deleting the pod that the audio client is not connected to has no impact.Monitor the reviewer web page. The transcriptions continue streaming to the page without major disruption even though pods are being deleted.
Switch to the other Cloud Shell tab that's running the audio client.
Observe that the client displays messages whenever the connection to the
Ingestor
pod is dropped and re-established.Press Control+C to stop the audio client.
Delete Reviewer pods
On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.
In Cloud Shell, restart the audio client:
python audio_client.py --targetip $INGEST_IP \ --file humptydumpty.wav
You see transcriptions appearing in the reviewer web page.
Switch to the other Cloud Shell tab.
Delete a
Reviewer
pod:python3 deleter.py --applabel reviewer --iterations 5 --delay 15
The script queries the Kubernetes control plane to get the names of all
Reviewer
pods, and then deletes a randomly selected pod. The command deletes a pod five times, waiting 15 seconds between each iteration.Monitor the reviewer web page.
The transcriptions continue streaming to the page without major disruption, even though pods are being deleted. The page displays the name of the
Reviewer
pod that it's connected to. You see this field change when the pod is deleted and when a connection is established with another pod.As before, because a
Reviewer
pod is randomly selected for deletion, the pod that the reviewer web page is connected to is deleted only about half the time. Deleting the pod that the demo web client is not connected to has no impact on transcription delivery.Switch to the other Cloud Shell tab and press Control+C to stop the audio client.
Testing Memorystore for Redis failover
The app uses Memorystore for Redis for fast, in-memory storage. It uses a Memorystore for Redis instance in the standard tier, which provides high availability through replication and automatic failover. A standard tier instance is automatically configured as a primary and replica pair. The replica acts as a standby, and it's located in a different zone than the primary. If the primary fails, requests are automatically redirected to the replica.
In this section, you test the Memorystore for Redis failover behavior by initiating a manual failover. During the time that the Memorystore for Redis service promotes the replica to the primary, the Memorystore for Redis instance is temporarily unavailable. This means that transcriptions stop for the duration of the failover.
- On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.
In Cloud Shell, restart the audio client to start streaming audio:
python audio_client.py --targetip $INGEST_IP \ --file humptydumpty.wav
In the other Cloud Shell tab, initiate a manual failover of Memorystore for Redis and confirm that you want the failover to proceed.
gcloud redis instances failover redis-captions \ --region us-central1 --project $PROJECT_ID
Watch the transcription output on the reviewer web page.
You see a
[REDIS-FAILOVER]
notification, which indicates that Memorystore is not available. As expected, the transcriptions stop while the replica is promoted to the primary. When this process is complete, the buffered audio data is processed, and the transcriptions resume.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Learn about optimizing your audio files for Speech-to-Text.
- Visualize speech data with the Speech Analysis framework.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.