This article explains the architecture and workflow for how to use Cloud Pub/Sub as a queuing system for processing potentially long-running tasks. The article uses automatic transcription of audio files as an example.
Although audio is a great addition to various types of media, sometimes it's useful to transcribe spoken words to text. Use cases include video accessibility through subtitles or analysis of call-center calls. When hours of videos are uploaded every second, or hours of helpdesk calls are created every day, automatic batch processing can save many hours when compared to manual transcription.
In this article, you learn about an implementation of a media processing infrastructure where audio files are uploaded to storage in the cloud—for example, after a phone call has been recorded. Once in the cloud, the files are transcribed and the results saved into a database for further analysis. The architecture described here also works well with larger images or videos, which can require hours of processing.
Architecture and workflow
The following diagram shows the architecture and workflow:
In this diagram, after a user uploads content, the system reacts as follows:
- A watcher app picks up the new media that has been uploaded.
- A webhook linked to the watcher adds a message with the media details to the queue.
- One of the workers that's subscribed to the queue pulls the message.
- The same worker gets the appropriate media from storage and processes it.
- After the processing is finished, the worker saves the result to the output storage location.
- The worker then acknowledges the message so it gets removed from the queue.
Requirements and infrastructure
This architecture requires several components.
Media needs to be stored so it can be easily accessed, processed, and delivered. The storage should be highly available with low latency, able to handle terabytes or petabytes of data, and globally accessible.
Cloud Storage offers three object-storage solutions with the same availability and durability:
- Standard. Useful for processing and storing data. Data is accessible in milliseconds. It also leverages a no-cost content-delivery network (CDN) for public content delivery using Google's edge network.
- Nearline. Nearline storage offers a low-cost, durable backup option. The first byte of data can be accessed in a few milliseconds, offering quick recovery options.
- Coldline. Coldline storage is a good choice for very-low-cost storage of data that you plan to access once a year, at most. Coldline is useful for long-term data archiving, online backup, and disaster recovery. Coldline also offers fast data retrieval.
In this solution, standard storage is used to save the files to be processed.
Files can arrive in the object store at any time. In order to respond quickly, the system needs a component that watches the storage location and automatically creates an event when a new media file is added. You can configure the Cloud Pub/Sub notifications for Cloud Storage feature to watch a bucket and fire an event to Cloud Pub/Sub when the content in the bucket changes.
If there are many incoming media files to process, the system requires multiple workers to process the media quickly. The more files waiting for processing, the more workers are required. To reduce costs and stay under provisioning limits, you can use a processing queue to minimize the number of workers you need to run concurrently.
Cloud Pub/Sub is a highly available messaging and queuing system. It can be accessed through an HTTP API and offers communication configurations such as one-to-many (fan-out), many-to-one (fan-in), and many-to-many. Cloud Pub/Sub supports push and pull subscription models.
This solution uses Cloud Pub/Sub to create a fault-tolerant and efficient queue that has the following features:
A single publisher that publishes a message for each new file, which contains metadata describing the file and its location.
A single queue that contains all of the publish messages.
A set of workers that pull messages from the queue. Using a pull subscription model ensures that workers have time to finish processing a media file before getting the next one to process.
If a worker pulls a message from the queue but does not acknowledge that it has finished that media file, the queuing system assumes the worker has failed and returns the message to the queue.
A worker acknowledges a message only after it finishes processing the media. The acknowledgement causes Cloud Pub/Sub to remove the message from the queue.
For processing that takes a long time, you can refresh the acknowledge deadline periodically so Cloud Pub/Sub doesn't put the message back into the queue while the original worker is still processing it.
Each message has a lifetime. If a message has not been processed before it expires, it's deleted from the queue. This prevents the queue from growing too large.
The following diagram illustrates how this solution uses Cloud Pub/Sub to process messages.
In the preceding diagram, you can see that after a message is pulled by a worker, the message remains in the queue. It is marked as claimed and is unavailable to other workers. It only becomes available again if the subscription doesn't receive acknowledgement from the worker who claimed the message before the specified acknowledge deadline is over.
Processing the files associated with the message queue has several steps:
- Read the queue.
- Read the audio file.
- Transcribe the audio.
- Save the result.
To perform all of these steps in a timely fashion, the system needs a set of workers that can scale to meet the incoming demand. Scaling can be handled either manually or automatically. Manual scaling is appropriate if the number and size of incoming files is fairly constant and you know how many workers are needed to handle the processing load. Automatic scaling can handle sudden changes in the number of incoming files.
Based on the preceding requirements, the system requires computing power to convert the media files, as well as the ability to automatically scale horizontally to provide additional workers as needed.
Compute Engine allows you to create virtual machine instances to perform computational tasks.
Managed instance groups work with both manual scaling and autoscaling. The instance group manager can create groups of identical machines by using either a machine image or a container instance template.
With manual scaling, the instance group manager ensures there is always the specified number of virtual machine instances up and running, by creating new instances as needed, and terminating non-responsive instances.
When you use autoscaling, the instance group manager creates or terminates instances based on metrics you select, such as average CPU, load on the HTTP load balancer, instance custom metrics, or Cloud Pub/Sub queue size.
A managed instance group can use autoscaling based on a combination of
Cloud Pub/Sub and
That way, the number of workers can scale up and down based on the queue size,
and will not terminate an instance while the VM is still processing a media
With the computational platform configured, your next consideration is how to parse the media file. In some cases, some of the job must be customized and requires a lot of computing power. This solution, which focuses on the queueing architecture and not on the media processing, uses Cloud Speech-to-Text to process some audio files synchronously.
This can take enough time to keep a worker busy so the queue can grow but keep the solution simple enough.
After the media has been processed, in this case an audio file, the results can be saved for further analysis. BigQuery can be quite useful because it offers a simple SQL interface, and also provides storage that can be leveraged by other Google tools.
Google Cloud Platform infrastructure
The following diagram illustrates how to build a media-processing solution using Google Cloud Platform (GCP).
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.