Best practices for using Pub/Sub metrics as a scaling signal

If you use Pub/Sub metrics as a signal to autoscale your pipeline, here are some recommendations.

Use more than one signal to autoscale your pipeline

Don't use only Pub/Sub metrics to autoscale your pipeline. It might lead to scenarios where you have a single point of failure for your autoscaling decisions. Instead, use a combination of signals to trigger autoscaling. An example of an additional signal is the client's CPU utilization level. This signal can indicate whether the client tasks are handling work and if scaling up can let the client tasks handle more work. Some examples of signals from other Cloud products that you can use for your pipeline are as follows:

  • Compute Engine (GCE) supports autoscaling based on signals such as CPU utilization and Monitoring metrics. Compute Engine also supports multiple metrics and multiple signals for better reliability.

    For more information about scaling with Monitoring metrics, see Scale based on Monitoring metrics. For more information about scaling with CPU utilization, see Scale based on CPU utilization.

  • Google Kubernetes Engine (GKE) Horizontal Pod autoscaling (HPA) supports autoscaling based on resource usage such as CPU and memory usage, custom Kubernetes metrics, and external metrics such as Monitoring metrics for Pub/Sub. It also supports multiple signals.

    For more information, see Horizontal Pod autoscaling.

How to deal with metrics gaps when they occur

Don't assume that the absence of metrics means that there are no messages to process. For example, if in response to missing metrics, you scale down processing tasks to zero, messages already in the backlog or that get published during this time are not be consumed. This increases the end-to-end latency. To minimize latency, set a minimum task count greater than zero so that you are always prepared to handle published messages, even if recent Pub/Sub metrics indicate an empty queue.

Both GCE autoscalers and GKE HPAs are designed to maintain the current replica count when metrics are unavailable. This provides a safety net if no metrics are available.

You can also implement Pub/Sub flow control mechanisms to help prevent tasks from being overwhelmed if they are unintentionally downscaled due to missing metrics.