This page shows you how to automatically increase or decrease the number of replicas of a given workload using custom, external, or Prometheus metrics.
Why autoscale based on metrics
Consider an application that pulls tasks from a queue and completes them. The application might have a Service-Level objective (SLO) for time to process a task, or for the number of tasks pending. If the queue is increasing, more replicas of the workload might meet the workloads SLO. If the queue is empty or is decreasing more quickly than expected, you could save money by running fewer replicas, while still meeting the workloads SLO.
About custom, Prometheus, and external metrics
You can scale workloads based on custom, Prometheus, or external metrics.
A custom metric is reported from your application running in Kubernetes. To learn more, see Custom and Prometheus metrics.
Metrics coming from Managed Service for Prometheus are considered a type of custom metric.
An external metric is reported from an application or service not running on your cluster, but whose performance impacts your Kubernetes application. For example, you can autoscale on any metric in Cloud Monitoring, including Pub/Sub or Dataflow. Prometheus metrics contain data omitted from your cluster that you can use to autoscale on. To learn more, see External metrics.
Custom and Prometheus metrics
We recommend that you use Managed Service for Prometheus to create and manage custom metrics. You can use Prometheus Query Language (PromQL) to query all metrics in Monitoring. For more information, see Horizontal Pod autoscaling for Managed Service for Prometheus.
Your application can report a custom metric to Monitoring. You can configure Kubernetes to respond to these metrics and scale your workload automatically. For example, you can scale your application based on metrics such as queries per second, writes per second, network performance, latency when communicating with a different application, or other metrics that make sense for your workload. For more information, see Optimize Pod autoscaling based on metrics.
If you need to scale your workload based on the performance of an application or service outside of Kubernetes, you can configure an external metric. For example, you might need to increase the capacity of your application to ingest messages from Pub/Sub if the number of undelivered messages is trending upward. The external application needs to export the metric to a Monitoring instance that the cluster can access. The trend of each metric over time causes Horizontal Pod Autoscaler to change the number of replicas in the workload automatically. For more information, see Optimize Pod autoscaling based on metrics.
Import metrics to Monitoring
To import metrics to Monitoring, you can either:
- Configure Managed Service for Prometheus (recommended), or
- Export metrics from the application using the Cloud Monitoring API.
- Learn how to enable horizontal Pod autoscaling for Managed Service for Prometheus.
- Learn more about Horizontal Pod Autoscaling.
- Learn more about Vertical Pod autoscaling.