Using Cloud Pub/Sub for Long-running Tasks

This solution explains how to use Google Cloud Pub/Sub as a queuing system for processing potentially long-running tasks. The solution uses automatic transcription of audio files as an example.

While audio is a great addition to various types of media, sometimes it's useful to transcribe spoken words to text. Use cases include video accessibility through subtitles or analysis of call-center calls. When hours of videos are uploaded every second, or hours of helpdesk calls are created every day, automatic batch processing can save many hours when compared to manual transcription.

In this solution, you learn about an implementation of a media processing infrastructure where audio files are uploaded to storage in the cloud, for example after a phone call has been recorded. There, they are transcribed and the results saved into a database for further analysis. The architecture described here also works well with larger images or videos, which can require hours of processing.

Architecture and workflow

The following diagram shows the architecture and workflow:

Overview of file upload and processing. Steps are described in following text.

In this diagram, after a user uploads content, the system reacts as follows:

  1. A watcher app picks up the new media that has been uploaded.
  2. A webhook linked to the watcher adds a message with the media details to the queue.
  3. One of the workers subscribed to the queue pulls the message.
  4. The same worker gets the appropriate media from storage and processes it.
  5. After it's finished, the worker saves the result to the output storage location.
  6. The worker then acknowledges the message so it gets removed from the queue.

Requirements and infrastructure

This architecture requires several components.


Media needs to be stored so it can be easily accessed, processed, and delivered. The storage should be highly available with low latency, able to handle terabytes—if not petabytes—of data, and globally accessible.

Google Cloud Storage offers three object-storage solutions with the same availability and durability:

  • Standard. Useful for processing and storing data. Data is accessible in milliseconds. It also leverages a no-cost content-delivery network (CDN) for public content delivery using Google's edge network.
  • Nearline. Nearline storage offers a low-cost, durable backup option. The first byte of data can be accessed in a few milliseconds, offering quick recovery options.
  • Coldline. Coldline storage is a good choice for very-low-cost storage of data that you plan to access once a year, at most. Coldline is useful for long-term data archiving, online backup, and disaster recovery. Coldline also offers fast data retrieval.

In this solution, standard storage is used to save the files to be processed.


Files can arrive in the object store at any time. In order to respond quickly, the system needs a component that watches the storage location and automatically creates an event when a new media file is added. You can implement a watcher using Cloud Storage object change notification in combination with Google App Engine. This architecture optimizes the notification component by running it within a serverless system that requires minimal management, has low latency, and automatically scales to meet demand.

Object change notification can be configured to watch a bucket and fire an event when the content inside the bucket changes. You can attach a webhook to that event in order to perform a task.

Using Google App Engine to deploy the webhook requires minimum IT operations and provides a platform that automatically scales, both of which let the developer focus on their code. In this solution, the webhook calls the Publisher module, which then writes a new message to Cloud Pub/Sub.

Message queue

If there are many incoming media files to process, the system will require multiple workers to process the media quickly. The more files waiting for processing, the more workers are required. To reduce costs and stay under provisioning limits, you can use a processing queue to minimize the number of workers you need to run concurrently.

Google Cloud Pub/Sub is a highly available messaging and queuing system. It can be accessed through an HTTP API and offers communication configurations such as one-to-many (fan-out), many-to-one (fan-in), and many-to-many. Cloud Pub/Sub supports both push and pull subscription models.

This solution uses Cloud Pub/Sub to create a fault-tolerant and efficient queue that has the following features:

  • A single publisher that publishes a message for each new file, which contains metadata describing the file and its location.

  • A single queue that contains all of the publish messages.

  • A set of workers that pull messages from the queue. Using a pull subscription model ensures that workers have time to finish processing a media file before getting the next one to process.

  • If a worker pulls a message from the queue but does not acknowledge that it has finished that media file, the queuing system assumes the worker has failed and returns the message to the queue.

  • A worker acknowledges a message only after it finishes processing the media. The acknowledgement causes Cloud Pub/Sub to remove the message from the queue.

  • For processing that takes a long time, you can refresh the acknowledge deadline periodically so Cloud Pub/Sub does not put the message back into the queue while the original worker is still processing it.

  • Each message has a lifetime. If a message has not been processed before it expires, it is deleted from the queue. This prevents the queue from growing too large.

The following diagram illustrates how this solution uses Cloud Pub/Sub to process messages.

Processing messages. Steps are described in following text.

In the preceding diagram, you can see that after a message is pulled by a worker, the message remains in the queue. It is marked as claimed and is unavailable to other workers. It will only become available again if the subscription does not receive acknowledgement from the worker who claimed the message before the specified acknowledge deadline is over.


Processing the files associated with the message queue has several steps:

  1. Read the queue.
  2. Read the audio file.
  3. Transcribe the audio.
  4. Save the result.


In order to perform all of these steps in a timely fashion, the system needs a set of workers that can scale to meet the incoming demand. Scaling can be handled either manually or automatically. Manual scaling is appropriate if the number and size of incoming files is fairly constant and you know how many workers are needed to handle the processing load. Automatic scaling can handle sudden changes in the number of incoming files.

Based on the preceding requirements, the system requires computing power to crop the images, as well as the ability to automatically scale horizontally to provide additional workers as needed.

Compute Engine instances are created based on a template.

Google Compute Engine allows you to create virtual machine instances to perform computational tasks.

Managed instance groups work with both manual scaling and autoscaling. The instance group manager can create groups of identical machines using either a machine image or container instance template.

With manual scaling, the instance group manager ensures there is always the specified number of virtual machine instances up and running, by creating new instances as needed, and terminating non-responsive instances.

When you use autoscaling, the instance group manager creates or terminates instances based on metrics you select, such as average CPU, load on the HTTP load balancer, instance custom metrics, or Cloud Pub/Sub queue size.

The accompanying tutorial uses Compute Engine instances in a managed instance group. The group uses autoscaling based on a combination of Cloud Pub/Sub and average CPU utilization. That way, the number of workers can scale up and down based on the queue size, and will not terminate an instance while the VM is still processing a media file. See Scaling Based on a Queue- based Workload page for more details.

Audio processing

With the computational platform configured, the next consideration is how to parse the media file. In some cases, some of the job will have to be customized and requires a lot of computing power. This solution, which focuses on the queueing architecture and not on the media processing, uses the Cloud Speech API to process some audio files synchronously.

This will take enough time to keep a worker busy so the queue can grow but keep our tutorial simple enough to implement.

Data storage

After the media has been processed, in this case an audio file, the results can be saved for further analysis. BigQuery can be quite useful because it offers a simple SQL interface, but also provides storage that can be leveraged by other Google tools.

Google Cloud Platform infrastructure

The following diagram illustrates how to build a media-processing solution using Google Cloud Platform.

Media processing solution architecture.


To walk through a sample media processing solution that uses this architecture, see Processing Media Using Cloud Pub/Sub and Google Compute Engine.

What's next

  • Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Send feedback about...