Configuring horizontal Pod autoscaling

Autopilot Standard

This page shows you how to scale your deployments in Google Kubernetes Engine (GKE) by automatically adjusting your resources using metrics like resource allocation, load balancer traffic, custom metrics, or multiple metrics simultaneously. This page also provides step-by-step instructions for configuring a Horizontal Pod Autoscaler (HPA) profile, including how to view, delete, clean, and troubleshoot your HPA object. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster..

This page is for Operators and Developers who manage application scaling in GKE and want to understand how to dynamically optimize performance and maintain cost efficiency through horizontal Pod autoscaling. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE user roles and tasks.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Ensure that you have an existing Autopilot or Standard cluster. If you need one, create an Autopilot cluster.

API versions for `HorizontalPodAutoscaler` objects

When you use the Google Cloud console, HorizontalPodAutoscaler objects are created using the autoscaling/v2 API.

When you use kubectl to create or view information about a Horizontal Pod Autoscaler, you can specify either the autoscaling/v1 API or the autoscaling/v2 API.

apiVersion: autoscaling/v1 is the default, and lets you autoscale based only on CPU utilization. To autoscale based on other metrics, using apiVersion: autoscaling/v2 is recommended. The example in Create the example Deployment uses apiVersion: autoscaling/v1.
apiVersion: autoscaling/v2 is recommended for creating new HorizontalPodAutoscaler objects. It lets you autoscale based on multiple metrics, including custom or external metrics. All other examples in this page use apiVersion: autoscaling/v2.

To check which API versions are supported, use the kubectl api-versions command.

You can specify which API to use when viewing details about a Horizontal Pod Autoscaler that uses apiVersion: autoscaling/v2.

Create the example Deployment

Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. The examples in this page apply different Horizontal Pod Autoscaler configurations to the following nginx Deployment. Separate examples show a Horizontal Pod Autoscaler based on resource utilization, based on a custom or external metric, and based on multiple metrics.

Save the following to a file named nginx.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        resources:
          # You must specify requests for CPU to autoscale
          # based on CPU utilization
          requests:
            cpu: "250m"

This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you don't specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.

To create the Deployment, apply the nginx.yaml manifest:

kubectl apply -f nginx.yaml

The Deployment has spec.replicas set to 3, so three Pods are deployed. You can verify this using the kubectl get deployment nginx command.

Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginx Deployment.

Autoscaling based on resources utilization

This example creates HorizontalPodAutoscaler object to autoscale the nginx Deployment when CPU utilization surpasses 50%, and ensures that there is always a minimum of 1 replica and a maximum of 10 replicas.

You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, the kubectl apply command, or for average CPU only, the kubectl autoscale command.

Console

Go to the Workloads page in the Google Cloud console.

Go to Workloads
Click the name of the nginx Deployment.
Click Actions > Autoscale.
Specify the following values:
- Minimum number of replicas: 1
- Maximum number of replicas: 10
- Autoscaling metric: CPU
- Target: 50
- Unit: %
Click Done.
Click Autoscale.

`kubectl apply`

Save the following YAML manifest as a file named nginx-hpa.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  # Set the minimum and maximum number of replicas the Deployment can scale to.
  minReplicas: 1
  maxReplicas: 10
  # The target average CPU utilization percentage across all Pods.
  targetCPUUtilizationPercentage: 50

To create the HPA, apply the manifest using the following command:

kubectl apply -f nginx-hpa.yaml

`kubectl autoscale`

To create a HorizontalPodAutoscaler object that only targets average CPU utilization, you can use the kubectl autoscale command:

kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10

To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:

kubectl get hpa

The output is similar to the following:

NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   0%/50%    1         10        3          61s

To get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or the kubectl command.

Console

Go to the Workloads page in the Google Cloud console.

Go to Workloads
Click the name of the nginx Deployment.
View the Horizontal Pod Autoscaler configuration in the Autoscaler section.
View more details about autoscaling events in the Events tab.

`kubectl get`

To get details about the Horizontal Pod Autoscaler, you can use kubectl get hpa with the -o yaml flag. The status field contains information about the current number of replicas and any recent autoscaling events.

kubectl get hpa nginx -o yaml

The output is similar to the following:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
      recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
      HPA was able to successfully calculate a replica count from cpu resource utilization
      (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
      desired count is within the acceptable range"}]'
    autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
  creationTimestamp: "2019-10-30T19:42:43Z"
  name: nginx
  namespace: default
  resourceVersion: "220050"
  selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
  uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  targetCPUUtilizationPercentage: 50
status:
  currentCPUUtilizationPercentage: 0
  currentReplicas: 3
  desiredReplicas: 3

Before following the remaining examples in this page, delete the HPA:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler was applied.

You can learn more about deleting a Horizontal Pod Autoscaler.

Autoscaling based on load balancer traffic

Traffic-based autoscaling is a capability of GKE that integrates traffic utilization signals from load balancers to autoscale Pods.

Using traffic as an autoscaling signal might be helpful since traffic is a leading indicator of load that is complementary to CPU and memory. Built-in integration with GKE ensures that the setup is easy and that autoscaling reacts to traffic spikes quickly to meet demand.

Traffic-based autoscaling is enabled by the Gateway controller and its global traffic management capabilities. To learn more, see Traffic-based autoscaling.

Autoscaling based on load balancer traffic is only available for Gateway workloads.

Requirements

Traffic-based autoscaling has the following requirements:

Supported on GKE versions 1.31 and later.
Gateway API enabled in your GKE cluster.
Supported for traffic that goes through load balancers deployed using the Gateway API and either the gke-l7-global-external-managed, gke-l7-regional-external-managed, gke-l7-rilb, or the gke-l7-gxlb GatewayClass.

Limitations

Traffic-based autoscaling has the following limitations:

Not supported by the multi-cluster GatewayClasses (gke-l7-global-external-managed-mc, gke-l7-regional-external-managed-mc, gke-l7-rilb-mc, and gke-l7-gxlb-mc).
Not supported for traffic using Services of type LoadBalancer.
There must be a clear and isolated relationship between the components involved in traffic-based autoscaling. One Horizontal Pod Autoscaler must be dedicated to scaling a single Deployment (or any scalable resource) exposed by a single Service.
After configuring the capacity of your Service using the maxRatePerEndpoint field, allow sufficient time (usually one minute, but potentially up to 15 minutes in large clusters) for the load balancer to be updated with this change, before configuring the Horizontal Pod Autoscaler with traffic-based metrics. This ensures your service won't temporarily experience a situation where your cluster tries to autoscale based on metrics emitted by a load balancer still undergoing configuration.
If traffic-based autoscaling is used on a Service served by multiple load balancers (for example -- by both an Ingress and a Gateway, or by two Gateways), the Horizontal Pod Autoscaler might consider the highest traffic value from individual load balancers to make scaling decisions, rather than the sum of traffic values from all load balancers.

Deploy traffic-based autoscaling

The following exercise uses the HorizontalPodAutoscaler to autoscale the store-autoscale Deployment based on the traffic it receives. A Gateway accepts ingress traffic from the internet for the Pods. The autoscaler compares traffic signals from the Gateway with the per-Pod traffic capacity that is configured on the store-autoscale Service resource. By generating traffic to the Gateway, you influence the number of Pods deployed.

The following diagram demonstrates how traffic-based autoscaling works:

HorizontalPodAutoscaler scaling a Deployment based on traffic.

To deploy traffic-based autoscaling, perform the following steps:

For Standard clusters, confirm that the GatewayClasses are installed in your cluster. For Autopilot clusters, the GatewayClasses are installed by default.

kubectl get gatewayclass

The output confirms that the GKE GatewayClass resources are ready to use in your cluster:

NAME                               CONTROLLER                  ACCEPTED   AGE
gke-l7-global-external-managed     networking.gke.io/gateway   True       16h
gke-l7-regional-external-managed   networking.gke.io/gateway   True       16h
gke-l7-gxlb                        networking.gke.io/gateway   True       16h
gke-l7-rilb                        networking.gke.io/gateway   True       16h

If you don't see this output, enable the Gateway API in your GKE cluster.

Deploy the sample application and Gateway load balancer to your cluster:
```
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yaml
```
The sample application creates:
- A Deployment with 2 replicas.
- A Service with an associated GCPBackendPolicy setting maxRatePerEndpoint set to 10. To learn more about Gateway capabilities, see GatewayClass capabilities.
- An external Gateway for accessing the application on the internet. To learn more about how to use Gateway load balancers, see Deploying Gateways.
- An HTTPRoute that matches all traffic and sends it to the store-autoscale Service.
The Service capacity is a critical element when using traffic-based autoscaling because it determines the amount of per-Pod traffic that triggers an autoscaling event. It is configured using a maxRatePerEndpoint field on a GCPBackendPolicy associated with the Service, which defines the maximum traffic a Service should receive in requests per second, per Pod. Service capacity is specific to your application.

For more information, see Determining your Service's capacity.
Save the following manifest as hpa.yaml:
```
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: store-autoscale
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: store-autoscale
  # Set the minimum and maximum number of replicas the Deployment can scale to.
  minReplicas: 1
  maxReplicas: 10
  # This section defines that scaling should be based on the fullness of load balancer
  # capacity, using the following configuration.
  metrics:
  - type: Object
    object:
      describedObject:
        kind: Service
        name: store-autoscale
      metric:
        # The name of the custom metric which measures how "full" a backend is
        # relative to its configured capacity.
        name: "autoscaling.googleapis.com|gclb-capacity-fullness"
      target:
        # The target average value for the metric. The autoscaler adjusts the number
        # of replicas to maintain an average capacity fullness of 70% across all Pods.
        averageValue: 70
        type: AverageValue
```
Note: If you previously used the autoscaling.googleapis.com|gclb-capacity-utilization metric name, we recommend that you switch to the autoscaling.googleapis.com|gclb-capacity-fullness metric name instead.

This manifest describes a HorizontalPodAutoscaler with the following properties:
- minReplicas and maxReplicas: sets the minimum and maximum number of replicas for this Deployment. In this configuration, the number of Pods can scale from 1 to 10 replicas.
- describedObject.name: store-autoscale: the reference to the store-autoscale Service that defines the traffic capacity.
- scaleTargetRef.name: store-autoscale: the reference to the store-autoscale Deployment that defines the resource that is scaled by the Horizontal Pod Autoscaler.
- averageValue: 70: target average value of 70% capacity utilization. This gives the Horizontal Pod Autoscaler a growth margin so that the running Pods can process excess traffic while new Pods are being created.
Note: A Deployment or a Service cannot be referenced by more than one Horizontal Pod Autoscaler. If this condition is not met, the Horizontal Pod Autoscaler stops autoscaling and errors appear in the Horizontal Pod Autoscaler events.

The Horizontal Pod Autoscaler results in the following traffic behavior:

The number of Pods is adjusted between 1 and 10 replicas to achieve 70% of the max rate per endpoint. This results in 7 RPS per Pod when maxRatePerEndpoint=10.
At more than 7 RPS per pod, Pods are scaled up until they've reached their maximum of 10 replicas or until the average traffic is 7 RPS per Pod.
If traffic is reduced, Pods scale down to a reasonable rate using the Horizontal Pod Autoscaler algorithm.

You can also deploy a traffic generator to validate traffic-based autoscaling behavior.

At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideally receives 6 RPS of traffic, which would be 60% utilization per Pod. This is under the 70% target utilization and so the Pods are scaled appropriately. Depending on traffic fluctuations, the number of autoscaled replicas might also fluctuate. For a more detailed description of how the number of replicas is computed, see Autoscaling behavior.

Autoscaling based on a custom or external metric

To create horizontal Pod autoscalers for custom metrics and external metrics, see Optimize Pod autoscaling based on metrics.

Autoscaling based on multiple metrics

This example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and a custom metric named packets_per_second.

If you followed the previous example and still have a Horizontal Pod Autoscaler named nginx, delete it before following this example.

This example requires apiVersion: autoscaling/v2. For more information about the available APIs, see API versions for HorizontalPodAutoscaler objects.

Save this YAML manifest as a file named nginx-multiple.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics: # The metrics to base the autoscaling on.
  - type: Resource
    resource:
      name: cpu # Scale based on CPU utilization.
      target:
        type: Utilization
        averageUtilization: 50
        # The HPA will scale the replicas to try and maintain an average
        # CPU utilization of 50% across all Pods.
  - type: Resource
    resource:
      name: memory # Scale based on memory usage.
      target:
        type: AverageValue
        averageValue: 100Mi
        # The HPA will scale the replicas to try and maintain an average
        # memory usage of 100 Mebibytes (MiB) across all Pods.
  # Uncomment these lines if you create the custom packets_per_second metric and
  # configure your app to export the metric.
  # - type: Pods
  #   pods:
  #     metric:
  #       name: packets_per_second
  #     target:
  #       type: AverageValue
  #       averageValue: 100

Apply the YAML manifest:

kubectl apply -f nginx-multiple.yaml

When created, the Horizontal Pod Autoscaler monitors the nginx Deployment for average CPU utilization, average memory utilization, and (if you uncommented it) the custom packets_per_second metric. The Horizontal Pod Autoscaler autoscales the Deployment based on the metric whose value would create the larger autoscale event.

Configure the Performance HPA profile

The Performance HPA profile improves the reaction time of the Horizontal Pod Autoscaler, enabling it to quickly recalculate a large number of HorizontalPodAutoscaler objects (up to 1,000 objects in minor versions 1.31-1.32 and 5,000 objects in version 1.33 or later).

This profile is automatically enabled on qualifying Autopilot clusters with a control plane running GKE version 1.32 or later. For Standard clusters, the profile is automatically enabled on qualifying clusters with a control plane running GKE version 1.33 or later.

A Standard cluster is exempt from auto-enablement of the Performance HPA profile if it meets all of the following conditions:

The cluster is upgrading from an earlier version to version 1.33 or later.
The cluster has at least one node pool with any of the following machine types: e2-micro, e2-custom-micro, g1-small, f1-micro.
Node auto-provisioning is not enabled.

You can also enable the Performance HPA profile on existing clusters if they meet the requirements.

Requirements

To enable the Performance HPA profile, verify that your Autopilot and Standard clusters meet the following requirements:

Your control plane is running GKE version 1.31 or later.
If your control plane is running GKE version 1.31, enable system metric collection.
The Autoscaling API is enabled in your cluster.
All node Service Accounts have the roles/autoscaling.metricsWriter role assigned.
If you use VPC Service Controls, verify that the Autoscaling API is included in your service perimeter.

Enable the Performance HPA profile

To enable the Performance HPA profile in your cluster, use the following command:

gcloud container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=performance

Replace:

CLUSTER_NAME: The name of the cluster.
LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
PROJECT_ID: Your Google Cloud project ID.

Disable the Performance HPA profile

To disable Performance HPA profile in a cluster, use the following command:

gcloud container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=none

Replace:

CLUSTER_NAME: The name of the cluster.
LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
PROJECT_ID: Your Google Cloud project ID.

Viewing details about a Horizontal Pod Autoscaler

To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:

kubectl describe hpa HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

If the Horizontal Pod Autoscaler uses apiVersion: autoscaling/v2 and is based on multiple metrics, the kubectl describe hpa command only shows the CPU metric. To see all metrics, use the following command instead:

kubectl describe hpa.v2.autoscaling HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

Each Horizontal Pod Autoscaler's current status is shown in Conditions field, and autoscaling events are listed in the Events field.

The output is similar to the following:

Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp:                                     Tue, 05 May 2020 20:07:11 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource memory on pods:                             2220032 / 100Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:                                                <none>

Deleting a Horizontal Pod Autoscaler

You can delete a Horizontal Pod Autoscaler using the Google Cloud console or the kubectl delete command.

Console

To delete the nginx Horizontal Pod Autoscaler:

Go to the Workloads page in the Google Cloud console.

Go to Workloads
Click the name of the nginx Deployment.
Click Actions > Autoscale.
Click Delete.

`kubectl delete`

To delete the nginx Horizontal Pod Autoscaler, use the following command:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remains at its existing scale, and does not revert back to the number of replicas in the Deployment's original manifest. To manually scale the Deployment back to three Pods, you can use the kubectl scale command:

kubectl scale deployment nginx --replicas=3

Cleaning up

Delete the Horizontal Pod Autoscaler, if you have not done so:
```
kubectl delete hpa nginx
```
Delete the nginx Deployment:
```
kubectl delete deployment nginx
```
Optionally, delete the cluster.

Troubleshooting

For advice on troubleshooting, see Troubleshoot horizontal Pod autoscaling.

What's next

Learn more about Horizontal Pod Autoscaling.
Learn more about Vertical Pod Autoscaling.
Learn how to optimize Pod autoscaling based on metrics.
Learn more about autoscaling Deployments with Custom Metrics.
Learn how to Assign CPU Resources to Containers and Pods.
Learn how to Assign Memory Resources to Containers and Pods.

Configuring horizontal Pod autoscaling

Before you begin

API versions for HorizontalPodAutoscaler objects

Create the example Deployment

Autoscaling based on resources utilization

Console

kubectl apply

kubectl autoscale

Console

kubectl get

Autoscaling based on load balancer traffic

Requirements

Limitations

Deploy traffic-based autoscaling

Autoscaling based on a custom or external metric

Autoscaling based on multiple metrics

Configure the Performance HPA profile

Requirements

Enable the Performance HPA profile

Disable the Performance HPA profile

Viewing details about a Horizontal Pod Autoscaler

Deleting a Horizontal Pod Autoscaler

Console

kubectl delete

Cleaning up

Troubleshooting

What's next

API versions for `HorizontalPodAutoscaler` objects

`kubectl apply`

`kubectl autoscale`

`kubectl get`

`kubectl delete`