Implementing production-ready live audio transcription using Speech-to-Text (Tutorial)

In this tutorial, you create an app that performs real-time transcription of an audio stream using Speech-to-Text, Google Kubernetes Engine (GKE), and Memorystore. The app is designed to be highly available and resilient, and provides a baseline for a production transcription app. The code for the app that you create in this tutorial is in a GitHub repository.

This tutorial is a companion to the architecture for production-ready live audio transcription using Speech-to-Text guide. For a deeper discussion of the use case and design decisions, see that document.

Objectives

  • Create a GKE cluster and a Memorystore instance.
  • Build and deploy app microservices to the GKE cluster.
  • Verify app behavior by streaming an audio file.
  • Test app resilience by introducing failures.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Cloud Build API and the Speech-to-Text, GKE, and Memorystore APIs.

    Enable the APIs

Architecture

The following diagram describes the architecture that you deploy in this tutorial.

Architecture of infrastructure on Google Cloud for a production-ready transcription app

The architecture includes the following features:

  • Three app microservices:
    • Ingestor: This service consumes the source audio stream.
    • Transcriber: This service calls Speech-to-Text and emits transcription results.
    • Reviewer: This service displays the transcriptions in a web app for review.
  • GKE, which hosts the app microservices in a regional GKE cluster that spans multiple zones. The app microservices are deployed across zones.
  • Memorystore for Redis, which is used as fast intermediate storage. This is deployed in a high-availability configuration.
  • Load balancers that expose app functionality to the internet in order to do the following:
    • Provide an IP address that the source audio stream can be directed to.
    • Serve the reviewer web app.

In this tutorial, you use the following clients to test app functionality:

  • An audio client script that streams an audio file to the Ingestor service.
  • A demo web client that displays the transcriptions from the Reviewer service.

Creating the Google Cloud infrastructure

  1. In Cloud Shell, create a variable for your Cloud project ID:

    export PROJECT_ID=project-id
    

    Replace project-id with the ID of the Cloud project that you created or selected for this tutorial.

  2. Set the project for your active Cloud Shell session:

    gcloud config set project $PROJECT_ID
    
  3. Create and launch a regional GKE cluster:

    gcloud container clusters create transcription-cluster \
        --cluster-version=1.14 \
        --region=us-central1 \
        --node-locations=us-central1-a,us-central1-b \
        --num-nodes=1 \
        --machine-type=n1-highcpu-2 \
        --scopes=cloud-platform \
        --enable-ip-alias \
        --metadata disable-legacy-endpoints=true
    

    For this tutorial, you create resources in the us-central1 region. A single node in two zones within the region is sufficient for the tasks in the tutorial.

    This task can take a few minutes to complete.

  4. Create a standard tier Memorystore instance in the same region as the GKE cluster:

    gcloud redis instances create redis-captions \
        --tier=standard \
        --region=us-central1 \
        --zone=us-central1-a
    

    Standard Tier instances provide high availability using replication and automatic failover. The instance consists of a primary node in the specified zone, and a replica in a different zone within the region. This task can take a few minutes to complete.

  5. Create an environment variable to store the IP address of the Memorystore instance:

    export REDIS_HOST=`gcloud redis instances describe redis-captions \
        --region=us-central1 --format='value(host)'`
    

Deploying the microservices

  1. In Cloud Shell, clone the GitHub repository that contains the code for the app you will deploy:

    git clone https://github.com/GoogleCloudPlatform/solutions-speech-productionized-transcription
    
    
  2. Change to the repository directory:

    cd solutions-speech-reliable-transcription
    
  3. Start a Cloud Build pipeline to build the Docker containers for the app microservices:

    gcloud builds submit --config cloudbuild.yaml
    

    The built containers are saved to your project's Container Registry.

  4. Change the Kubernetes YAML config files to use your Cloud project ID:

    sed -i "s/myproject/$PROJECT_ID/" k8s/*.yaml
    
  5. Change the Kubernetes YAML config files to set the IP address of your Memorystore instance:

    sed -i "s/redisHost=.*/redisHost=$REDIS_HOST/" k8s/*.yaml
    
  6. Deploy the Ingestor, Transcriber, and Reviewer microservices to the GKE cluster:

    kubectl apply -f \
        k8s/ingestor.yaml,k8s/transcriber.yaml,k8s/reviewer.yaml
    
  7. Verify that all three of the Kubernetes Deployments have been created in the GKE cluster:

    kubectl get deployments
    

    The output is similar to the following. Note that each deployment has two pods.

    NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
    ingestor-deployment     2/2     2            2           13s
    reviewer-deployment     2/2     2            2           12s
    transcriber-deployment  2/2     2            2           12s
    
  8. Verify that the Ingestor and Reviewer Kubernetes services have been created in the GKE cluster:

    kubectl get services
    

    The output is similar to the following:

    NAME               TYPE           CLUSTER-IP   EXTERNAL-IP
    ingestor-service   LoadBalancer   10.0.9.193   35.224.219.112
    kubernetes         ClusterIP      10.0.0.1     <none>
    reviewer-service   LoadBalancer   10.0.9.203   35.223.138.37
    
  9. Set an environment variable to the external IP address of the Ingestor service:

    export INGEST_IP=`kubectl get services ingestor-service \
        -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`
    
  10. Get the contents of the Ingestor page to check that the page is available:

    curl $INGEST_IP; echo
    

    If the page is working, you see a Hello message.

  11. Set an environment variable with the external IP address of the Reviewer service, and display the variable value in the console:

    export REVIEWER_IP=`kubectl get services reviewer-service \
        -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`; \
        echo $REVIEWER_IP
    
  12. Copy the REVIEWER_IP address that was displayed by the preceding command.

  13. On your local machine, open a browser tab to that IP address that you just copied.

    You see a reviewer web page that displays the text Live Transcription demo. Later on, transcriptions will be written to this web page, so keep the page open.

Verifying transcription of an audio stream

In this section, you stream an audio file to the app that you just deployed and you examine the generated transcriptions. The audio is a reading of the first few sentences of the Humpty Dumpty chapter from the book Through the Looking-Glass by Lewis Carroll. For comparison against how Speech-to-Text transcribes the audio, you can see the text of the chapter on the Gutenberg.org site.

The audio client streams the audio to the Ingestor service using the SocketIO library, which is an open source package that provides real-time, reliable, bi-directional communication. Similarly, the Reviewer service delivers the transcriptions to the demo web client using SocketIO.

The Transcriber service includes configuration for Speech-to-Text that match the characteristics of the sample audio file. Specifically, the configuration defines the audio sample rate (44100 kHz), the number of audio channels (single channel; mono), and the input language (US English). If you want to stream a different audio file, you might need to update the configuration to match your input audio.

  1. In Cloud Shell, change to the client directory:

    cd client
    
  2. Create and activate a new Python 3 virtual environment. The virtualenv utility is already installed in Cloud Shell.

    virtualenv -p python3 venv
    source venv/bin/activate
    
  3. Install the Python packages that are required by the demo web client:

    pip install -r requirements.txt
    
  4. Run the audio client script to stream an audio file to the Ingestor service:

    python audio_client.py --targetip $INGEST_IP \
        --file humptydumpty.wav
    
  5. On your local machine, return to the reviewer web page.

    You see the transcriptions streaming to the page.

  6. Compare the Speech-to-Text transcriptions to the chapter text to determine how accurate the transcriptions are.

    The following screenshot shows the text and an example transcription.

    The original text on the top, and a transcription below.

  7. In Cloud Shell, press Control+C to stop the audio stream.

Testing Transcriber failover

As described in the companion guide, the Transcriber service uses a leader election pattern to ensure that only a single Transcriber pod is connected to Speech-to-Text, and that the remaining pods act as hot standbys for efficient failover. In this section, you verify the failover behavior by monitoring transcription output when the leader pod is deleted.

When a new Transcriber pod is elected as leader, as part of the recovery process, the new leader replays the last few seconds of the most recently received audio. This helps minimize audio loss when the previous leader goes offline. This approach can result in some duplicate transcribed words, because the previous audio is replayed. In a production app, the processes that consume the transcriptions must reconcile any duplicates.

  1. On your local machine, refresh the browser tab to clear any existing transcriptions from the reviewer web page.

  2. In Cloud Shell, restart the audio client:

    python audio_client.py --targetip $INGEST_IP \
        --file humptydumpty.wav
    

    You see transcriptions appearing in the reviewer web page.

  3. Open another Cloud Shell tab so that you have two Cloud Shell tabs open.

  4. In the new Cloud Shell tab, set an environment variable to your project ID:

    export PROJECT_ID=project-id
    

    Replace project-id with the ID of the Cloud project that you created or selected for this tutorial.

  5. Change to the repository directory:

    cd solutions-speech-reliable-transcription
    
  6. Delete the leader transcriber pod:

    python3 deleter.py --leader --iterations 3 --delay 15
    

    The script queries the Kubernetes control plane to get the identity of the current leader pod, and then it deletes the pod. The command deletes the leader three times, waiting 15 seconds between each iteration.

  7. Monitor the reviewer web page.

    You see a [REPLAY] notification in the transcription stream. This indicates that a new Transcriber pod has been elected as leader, and that the last few seconds of audio data are being replayed. You might see some duplicated words in the transcriptions. Observe that the failover is very fast, and there is limited disruption to the transcription output.

  8. Verify that the Transcriber deployment still has two pods:

    kubectl get pods -l=app=transcriber
    

    The output is similar to the following:

    NAME                                      READY   STATUS
    transcriber-deployment-7f57746c7c-rjwm5   2/2     Running
    transcriber-deployment-7f57746c7c-t7srr   2/2     Running
    

    Because the Transcriber is a Kubernetes Deployment, Kubernetes automatically creates new pods so that the specified number of pod replicas is satisfied.

  9. Return to the first Cloud Shell tab and stop the audio client by pressing Control+C.

Deleting other microservice pods

In the previous section, you verified that transcription delivery is not disrupted when Transcriber pods are deleted. In this section, you test app behavior when other microservice pods are deleted.

The Ingestor and Reviewer pods are exposed to the internet by Kubernetes services of type LoadBalancer. Clients connect to each Kubernetes service using a stable IP address, and Kubernetes routes the traffic to an available pod. When Ingestor or Reviewer pods are deleted, Kubernetes updates the corresponding service so that traffic is not directed to a nonexistent pod. Similarly, when new pods are then created to satisfy the configured number of pod replicas, the service is updated so that traffic can be sent to the new pod.

In this app, you rely on this behavior to keep traffic moving through the app. Because both the Ingestor and Reviewer services have two pods each, when a pod is removed, Kubernetes can quickly redirect traffic to another pod that's ready and that can start processing traffic.

The audio client and demo web client both use SocketIO to connect with the Ingestor and Reviewer services. When SocketIO loses a connection, it automatically attempts to reconnect. That way, if the current pod is deleted, the clients reconnect to a new pod using the same service IP address.

Delete Ingestor pods

  1. On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.

  2. In Cloud Shell, restart the audio client:

    python audio_client.py --targetip $INGEST_IP \
        --file humptydumpty.wav
    

    You see transcriptions appearing in the reviewer page.

  3. Switch to the other Cloud Shell tab.

  4. Delete an Ingestor pod:

    python3 deleter.py --applabel ingestor --iterations 5 --delay 15
    

    The script queries the Kubernetes control plane to get the names of all Ingestor pods, and then deletes a randomly selected pod. The command deletes a pod five times, waiting 15 seconds between each iteration.

    Because the script randomly selects an Ingestor pod to delete, the pod that the audio client is connected to is deleted only about half the time. Deleting the pod that the audio client is not connected to has no impact.

  5. Monitor the reviewer web page. The transcriptions continue streaming to the page without major disruption even though pods are being deleted.

  6. Switch to the other Cloud Shell tab that's running the audio client.

    Observe that the client displays messages whenever the connection to the Ingestor pod is dropped and re-established.

  7. Press Control+C to stop the audio client.

Delete Reviewer pods

  1. On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.

  2. In Cloud Shell, restart the audio client:

    python audio_client.py --targetip $INGEST_IP \
        --file humptydumpty.wav
    

    You see transcriptions appearing in the reviewer web page.

  3. Switch to the other Cloud Shell tab.

  4. Delete a Reviewer pod:

    python3 deleter.py --applabel reviewer --iterations 5 --delay 15
    

    The script queries the Kubernetes control plane to get the names of all Reviewer pods, and then deletes a randomly selected pod. The command deletes a pod five times, waiting 15 seconds between each iteration.

  5. Monitor the reviewer web page.

    The transcriptions continue streaming to the page without major disruption, even though pods are being deleted. The page displays the name of the Reviewer pod that it's connected to. You see this field change when the pod is deleted and when a connection is established with another pod.

    As before, because a Reviewer pod is randomly selected for deletion, the pod that the reviewer web page is connected to is deleted only about half the time. Deleting the pod that the demo web client is not connected to has no impact on transcription delivery.

  6. Switch to the other Cloud Shell tab and press Control+C to stop the audio client.

Testing Memorystore for Redis failover

The app uses Memorystore for Redis for fast, in-memory storage. It uses a Memorystore for Redis instance in the standard tier, which provides high availability through replication and automatic failover. A standard tier instance is automatically configured as a primary and replica pair. The replica acts as a standby, and it's located in a different zone than the primary. If the primary fails, requests are automatically redirected to the replica.

In this section, you test the Memorystore for Redis failover behavior by initiating a manual failover. During the time that the Memorystore for Redis service promotes the replica to the primary, the Memorystore for Redis instance is temporarily unavailable. This means that transcriptions stop for the duration of the failover.

  1. On your local machine, refresh the browser tab to clear existing transcriptions from the reviewer web page.
  2. In Cloud Shell, restart the audio client to start streaming audio:

    python audio_client.py --targetip $INGEST_IP \
        --file humptydumpty.wav
    
  3. In the other Cloud Shell tab, initiate a manual failover of Memorystore for Redis and confirm that you want the failover to proceed.

    gcloud redis instances failover redis-captions \
        --region us-central1 --project $PROJECT_ID
    
  4. Watch the transcription output on the reviewer web page.

    You see a [REDIS-FAILOVER] notification, which indicates that Memorystore is not available. As expected, the transcriptions stop while the replica is promoted to the primary. When this process is complete, the buffered audio data is processed, and the transcriptions resume.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next