Enable cluster autoscaling

This document describes autoscaling for a user cluster in Google Distributed Cloud.

Cluster autoscaling increases or decreases the number of nodes in a node pool based on the demands of your workloads.

Before you begin

Read about the limitations of the cluster autoscaler.

The cluster autoscaler makes the following assumptions:

All replicated Pods can be restarted on some other node, which might cause a brief disruption. If your services cannot tolerate disruption, we do not recommend using the cluster autoscaler.
Users or administrators do not manually manage nodes. If autoscaling is enabled for a node pool, you cannot override the replicas field of the node pool.
All nodes in a single node pool have the same set of labels.

How cluster autoscaling works

The cluster autoscaler works on a node-pool basis. When you enable autoscaling for a node pool, you specify a minimum and maximum number of nodes for the pool.

The cluster autoscaler increases or decreases the number of nodes in the pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on the nodes. It periodically checks the status of Pods and nodes, and takes action:

If Pods are unschedulable because there are not enough nodes in the pool, the cluster autoscaler adds nodes, up to the specified maximum.
If nodes are under-utilized, and all Pods could be scheduled with fewer nodes in the pool, the cluster autoscaler removes nodes, down to the specified minimum. If a node can't be drained gracefully, the node is forcibly terminated, and the attached Kubernetes-managed disk is safely detached.

If your Pods have requested too few resources (or haven't changed the defaults, which might be insufficient) and your nodes are experiencing shortages, the cluster autoscaler does not correct that situation. You can help ensure that the cluster autoscaler works accurately by making explicit resource requests for all of your workloads.

For an individual node pool, minReplicas must be ≥ 1. However, the sum of the untainted user cluster nodes at any given time must be at least 3. This means the sum of the minReplicas values for all autoscaled node pools, plus the sum of the replicas values for all non-autoscaled node pools, must be at least 3.

The cluster autoscaler considers the relative cost of the instance types in the various node pools, and attempts to expand the node pool in a way that causes the least waste possible.

Create a user cluster with autoscaling

To create a user cluster with autoscaling enabled for a node pool, fill in the autoscaling section for the node pool in the user cluster configuration file. For example:

nodePools:
- name: pool‐1
  …
  replicas: 3
  ...
  autoscaling:
    minReplicas: 1
    maxReplicas: 5

The preceding configuration creates a node pool with 3 replicas, and applies autoscaling with the minimum node pool size as 1 and the maximum node pool size as 5.

The minReplicas value must be ≥ 1. A node pool can't scale down to zero nodes.

Add a node pool with autoscaling

To add a node pool with autoscaling to an existing cluster:

Edit the user cluster configuration file to add a new node pool, and include an autoscaling section for the pool. Set the values of replicas, minReplicas, and maxReplicas as desired. For example:
```
nodePools:
- name: my-new-node-pool
  …
  replicas: 3
  ...
  autoscaling:
    minReplicas: 2
    maxReplicas: 6
```

Update the cluster:

gkectl update cluster --config USER_CLUSTER_CONFIG \
  --kubeconfig ADMIN_CLUSTER_KUBECONFIG

Enable autoscaling for an existing node pool

To enable autoscaling for a node pool in an existing cluster:

Edit a specific nodePool in the user cluster configuration file, and include the autoscaling section. Set the values of minReplicas and maxReplicas as desired.
```
nodePools:
- name: my-existing-node-pool
  …
  replicas: 3
  ...
  autoscaling:
    minReplicas: 1
    maxReplicas: 5
```

Update the cluster:

gkectl update cluster --config USER_CLUSTER_CONFIG \
  --kubeconfig ADMIN_CLUSTER_KUBECONFIG

Disable autoscaling for an existing node pool

To disable autoscaling for a specific node pool:

Edit the user cluster configuration file and remove the autoscaling section for that node pool.
Run gkectl update cluster.

Check cluster autoscaler behavior

You can determine what the cluster autoscaler is doing in several ways.

Check cluster autoscaler logs

First, find the name of the cluster autoscaler Pod:

kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG get pods -n USER_CLUSTER_NAME | grep cluster-autoscaler

Check logs on the cluster autoscaler Pod:

kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG logs cluster-autoscaler-POD_NAME --container cluster-autoscaler -n USER_CLUSTER_NAME

Replace POD_NAME with the name of the cluster autoscaler Pod.

Check the configuration map

The cluster autoscaler publishes the kube-system/cluster-autoscaler-status configuration map.

To see the configuration map:

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG get configmap cluster-autoscaler-status -n kube-system -o yaml

Check cluster autoscale events.

You can check cluster autoscale events:

On pods (particularly those that cannot be scheduled, or on underutilized nodes)
On nodes
On the kube-system/cluster-autoscaler-status config map.

Limitations

The cluster autoscaler has the following limitations:

Custom scheduling with altered filters is not supported.
Nodes do not scale up if Pods have a PriorityClass value below -10. Learn more in How does Cluster Autoscaler work with Pod Priority and Preemption?.
Auto-scaling for Windows node pools is not supported.

Troubleshooting

Occasionally, the cluster autoscaler cannot scale down completely and an extra node exists after scaling down. This can occur when required system Pods are scheduled onto different nodes, because there is no trigger for any of those Pods to be moved to a different node. See I have a couple of nodes with low utilization, but they are not scaled down. Why?. To work around this limitation, you can configure a Pod disruption budget.
If you are having problems with downscaling your cluster, see Pod scheduling and disruption. You might have to add a PodDisruptionBudget for the kube-system Pods. For more information about manually adding a PodDisruptionBudget for the kube-system Pods, see the Kubernetes cluster autoscaler FAQ.
When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions:
- The Pod's affinity or anti-affinity rules prevent rescheduling.
- The Pod has local storage.
- The Pod is not managed by a controller such as a Deployment, StatefulSet, Job, or ReplicaSet.
- The Pod is in kube-system namespace and does not have a PodDisruptionBudget
An application's PodDisruptionBudget might prevent autoscaling. If deleting nodes would cause the budget to be exceeded, the cluster does not scale down.

More information

For more information about cluster autoscaler and preventing disruptions, see the following questions in the Kubernetes cluster autoscaler FAQ: