Cluster autoscaler

This page explains how to automatically resize your Standard Google Kubernetes Engine (GKE) cluster's node pools based on the demands of your workloads. When demand is high, the cluster autoscaler adds nodes to the node pool. When demand is low, the cluster autoscaler scales back down to a minimum size that you designate. This can increase the availability of your workloads when you need it, while controlling costs. To learn how to configure cluster autoscaler, see Autoscaling a cluster.

With Autopilot clusters, you don't need to worry about provisioning nodes or managing node pools because node pools are automatically provisioned through node auto-provisioning, and are automatically scaled to meet the requirements of your workloads.

Overview

GKE's cluster autoscaler automatically resizes the number of nodes in a given node pool, based on the demands of your workloads. You don't need to manually add or remove nodes or over-provision your node pools. Instead, you specify a minimum and maximum size for the node pool, and the rest is automatic.

If resources are deleted or moved when autoscaling your cluster, your workloads might experience transient disruption. For example, if your workload consists of a controller with a single replica, that replica's Pod might be rescheduled onto a different node if its current node is deleted. Before enabling cluster autoscaler, design your workloads to tolerate potential disruption or ensure that critical Pods are not interrupted.

You can increase the cluster autoscaler performance with Image streaming, which remotely streams required image data from eligible container images while simultaneously caching the image locally to allow workloads on new nodes to start faster.

How cluster autoscaler works

Cluster autoscaler works on a per-node pool basis. When you configure a node pool with cluster autoscaler, you specify a minimum and maximum size for the node pool.

Cluster autoscaler increases or decreases the size of the node pool automatically by adding or removing virtual machine (VM) instances in the underlying Compute Engine Managed Instance Group (MIG) for the node pool. Cluster autoscaler makes these scaling decisions based on the resource requests (rather than actual resource utilization) of Pods running on that node pool's nodes. It periodically checks the status of Pods and nodes, and takes action:

  • If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool.
  • If nodes are under-utilized, and all Pods could be scheduled even with fewer nodes in the node pool, Cluster autoscaler removes nodes, down to the minimum size of the node pool. If there are Pods on a node that cannot move to other nodes in the cluster, cluster autoscaler does not attempt to scale down that node. If Pods can be moved to other nodes, but the node cannot be drained gracefully after a timeout period (currently 10 minutes), the node is forcibly terminated. The grace period is not configurable for GKE clusters. For more information about how scale down works, see the cluster autoscaler documentation.

If your Pods have requested too few resources (or haven't changed the defaults, which might be insufficient) and your nodes are experiencing shortages, cluster autoscaler does not correct the situation. You can help ensure cluster autoscaler works as accurately as possible by making explicit resource requests for all of your workloads.

Operating criteria

Cluster autoscaler makes the following assumptions when resizing a node pool:

  • All replicated Pods can be restarted on some other node, possibly causing a brief disruption. If your services are not disruption-tolerant, using cluster autoscaler is not recommended.
  • Users or administrators are not manually managing nodes; it can override any manual node management operations you perform.
  • All nodes in a single node pool have the same set of labels.
  • Cluster autoscaler considers the relative cost of the instance types in the various pools, and attempts to expand the least expensive possible node pool. The reduced cost of node pools containing Spot VMs is taken into account.
  • Labels manually added after initial cluster or node pool creation are not tracked. Nodes created by cluster autoscaler are assigned labels specified with --node-labels at the time of node pool creation.
  • In GKE version 1.21 or earlier, cluster autoscaler considers the taint information of the existing nodes from a node pool to represent the whole node pool. Starting in GKE version 1.22, cluster autoscaler combines information from existing nodes in the cluster and the node pool. Cluster autoscaler detects manual node and node pool changes to scale up.

Balancing across zones

If your node pool contains multiple managed instance groups with the same instance type, cluster autoscaler attempts to keep these managed instance group sizes balanced when scaling up. This helps prevent an uneven distribution of nodes among managed instance groups in multiple zones of a node pool. GKE does not consider the autoscaling policy when scaling down.

Location policy

Starting in GKE version 1.24.1-gke.800, you can change the location policy of the GKE cluster autoscaler. You can control the cluster autoscaler distribution policy by specifying the location_policy flag with any of the following values:

  • BALANCED: the autoscaler considers Pod requirements and the availability of resources in each zone. This does not guarantee similar node groups will have exactly the same sizes, as the autoscaler considers many factors, including available capacity in a given zone and zone affinities of Pods that triggered scale up.
  • ANY: the autoscaler prioritizes utilization of unused reservations and accounts for current constraints of available resources.

    This policy is recommended if you are using Spot VMs or if you want to use VM reservations that are not equal between zones

Default values

For Spot VMs node pools, the default cluster autoscaler distribution policy is ANY. In this policy, Spot VMs have a lower risk of being preempted.

For non-preemptible node pools, the default cluster autoscaler distribution policy is BALANCED.

Minimum and maximum node pool size

You can specify the minimum and maximum size for each node pool in your cluster, and cluster autoscaler makes rescaling decisions within these boundaries. If the current node pool size is lower than the specified minimum or greater than the specified maximum when you enable autoscaling, the autoscaler waits to take effect until a new node is needed in the node pool or until a node can be safely deleted from the node pool.

Autoscaling limits

You can set the minimum and maximum number of nodes for the cluster autoscaler to use when scaling a node pool. Use the --min-nodes and --max-nodes flags to set the minimum and maximum number of nodes per zone

Starting in GKE version 1.24, you can use the --total-min-nodes and --total-max-nodes flags for new clusters. These flags set the minimum and maximum number of the total number of nodes in the node pool across all zones.

Min and max nodes example

The following command creates an autoscaling multi-zonal cluster with six nodes across three zones initially, with a minimum of one node per zone and a maximum of four nodes per zone:

gcloud container clusters create example-cluster \
    --num-nodes=2 \
    --zone=us-central1-a \
    --node-locations=us-central1-a,us-central1-b,us-central1-f \
    --enable-autoscaling --min-nodes=1 --max-nodes=4

In this example, the total size of the cluster can be between three and twelve nodes, spread across the three zones. If one of the zones fails, the total size of the cluster can be between two and eight nodes.

Total nodes example

The following command, available in GKE version 1.24 or later, creates an autoscaling multi-zonal cluster with six nodes across three zones initially, with a minimum of three nodes and a maximum of twelve nodes in the node pool across all zones:

gcloud container clusters create example-cluster \
    --num-nodes=2 \
    --zone=us-central1-a \
    --node-locations=us-central1-a,us-central1-b,us-central1-f \
    --enable-autoscaling --total-min-nodes=3 --total-max-nodes=12

In this example, the total size of the cluster can be between three and twelve nodes, regardless of spreading between zones.

Autoscaling profiles

The decision of when to remove a node is a trade-off between optimizing for utilization or the availability of resources. Removing underutilized nodes improves cluster utilization, but new workloads might have to wait for resources to be provisioned again before they can run.

You can specify which autoscaling profile to use when making such decisions. The currently available profiles are:

In GKE version 1.18 and later, when you enable the optimize-utilization autoscaling profile, GKE prefers to schedule Pods in nodes that already have high allocation of CPU or memory. In GKE version 1.22 and later, if the optimize-utilization autoscaling profile is enabled, GKE also considers high GPU allocation when scheduling Pods.

The optimize-utilization autoscaling profile helps the cluster autoscaler to identify and remove underutilized nodes. To achieve this optimization, GKE sets the scheduler name in the Pod spec to gke.io/optimize-utilization-scheduler. Pods that specify a custom scheduler are not affected.

The following command enables optimize-utilization autoscaling profile in an existing cluster:

gcloud container clusters update CLUSTER_NAME \
    --autoscaling-profile optimize-utilization

Considering Pod scheduling and disruption

When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions:

  • The Pod's affinity or anti-affinity rules prevent rescheduling.
  • The Pod is not managed by a Controller such as a Deployment, StatefulSet, Job or ReplicaSet.
  • The Pod has local storage and the GKE control plane version is lower than 1.22. In GKE clusters with control plane version 1.22 or later, Pods with local storage no longer block scaling down.
  • The Pod has the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.
  • The node's deletion would exceed the configured PodDisruptionBudget.

For more information about cluster autoscaler and preventing disruptions, see the following questions in the Cluster autoscaler FAQ:

Additional information

You can find more information about cluster autoscaler in the Autoscaling FAQ in the open-source Kubernetes project.

Limitations

Cluster autoscaler has the following limitations:

Known issues

  • In GKE control plane version prior to 1.22, GKE cluster autoscaler stops scaling up all node pools on empty (zero node) clusters. This behavior doesn't occur in GKE version 1.22 and later.

What's next