This document describes autoscaling for a user cluster in Google Distributed Cloud.
Cluster autoscaling increases or decreases the number of nodes in a node pool based on the demands of your workloads.
Before you begin
Read about the limitations of the cluster autoscaler.
The cluster autoscaler makes the following assumptions:
All replicated Pods can be restarted on some other node, which might cause a brief disruption. If your services cannot tolerate disruption, we do not recommend using the cluster autoscaler.
Users or administrators do not manually manage nodes. If autoscaling is enabled for a node pool, you cannot override the
replicas
field of the node pool.All nodes in a single node pool have the same set of labels.
How cluster autoscaling works
The cluster autoscaler works on a node-pool basis. When you enable autoscaling for a node pool, you specify a minimum and maximum number of nodes for the pool.
The cluster autoscaler increases or decreases the number of nodes in the pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on the nodes. It periodically checks the status of Pods and nodes, and takes action:
If Pods are unschedulable because there are not enough nodes in the pool, the cluster autoscaler adds nodes, up to the specified maximum.
If nodes are under-utilized, and all Pods could be scheduled with fewer nodes in the pool, the cluster autoscaler removes nodes, down to the specified minimum. If a node can't be drained gracefully, the node is forcibly terminated, and the attached Kubernetes-managed disk is safely detached.
If your Pods have requested too few resources (or haven't changed the defaults, which might be insufficient) and your nodes are experiencing shortages, the cluster autoscaler does not correct that situation. You can help ensure that the cluster autoscaler works accurately by making explicit resource requests for all of your workloads.
For an individual node pool, minReplicas
must be ≥ 1. However, the sum
of the untainted user cluster nodes at any given time must be at least 3. This
means the sum of the minReplicas
values for all autoscaled node pools, plus
the sum of the replicas
values for all non-autoscaled node pools, must be at
least 3.
The cluster autoscaler considers the relative cost of the instance types in the various node pools, and attempts to expand the node pool in a way that causes the least waste possible.
Create a user cluster with autoscaling
To create a user cluster with autoscaling enabled for a node pool, fill in the
autoscaling
section for the node pool in the user cluster configuration file. For example:
nodePools: - name: pool‐1 … replicas: 3 ... autoscaling: minReplicas: 1 maxReplicas: 5
The preceding configuration creates a node pool with 3 replicas, and applies autoscaling with the minimum node pool size as 1 and the maximum node pool size as 5.
The minReplicas
value must be ≥ 1. A node pool can't scale down to zero
nodes.
Add a node pool with autoscaling
To add a node pool with autoscaling to an existing cluster:
Edit the user cluster configuration file to add a new node pool, and include an
autoscaling
section for the pool. Set the values ofreplicas
,minReplicas
, andmaxReplicas
as desired. For example:nodePools: - name: my-new-node-pool … replicas: 3 ... autoscaling: minReplicas: 2 maxReplicas: 6
Update the cluster:
gkectl update cluster --config USER_CLUSTER_CONFIG \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG
Enable autoscaling for an existing node pool
To enable autoscaling for a node pool in an existing cluster:
Edit a specific
nodePool
in the user cluster configuration file, and include theautoscaling
section. Set the values ofminReplicas
andmaxReplicas
as desired.nodePools: - name: my-existing-node-pool … replicas: 3 ... autoscaling: minReplicas: 1 maxReplicas: 5
Update the cluster:
gkectl update cluster --config USER_CLUSTER_CONFIG \ --kubeconfig ADMIN_CLUSTER_KUBECONFIG
Disable autoscaling for an existing node pool
To disable autoscaling for a specific node pool:
Edit the user cluster configuration file and remove the
autoscaling
section for that node pool.Run
gkectl update cluster
.
Check cluster autoscaler behavior
You can determine what the cluster autoscaler is doing in several ways.
Check cluster autoscaler logs
First, find the name of the cluster autoscaler Pod:
kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG get pods -n USER_CLUSTER_NAME | grep cluster-autoscaler
Check logs on the cluster autoscaler Pod:
kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG logs cluster-autoscaler-POD_NAME --container cluster-autoscaler -n USER_CLUSTER_NAME
Replace POD_NAME with the name of the cluster autoscaler Pod.
Check the configuration map
The cluster autoscaler publishes the kube-system/cluster-autoscaler-status
configuration map.
To see the configuration map:
kubectl --kubeconfig USER_CLUSTER_KUBECONFIG get configmap cluster-autoscaler-status -n kube-system -o yaml
Check cluster autoscale events.
You can check cluster autoscale events:
- On pods (particularly those that cannot be scheduled, or on underutilized nodes)
- On nodes
- On the
kube-system/cluster-autoscaler-status
config map.
Limitations
The cluster autoscaler has the following limitations:
Custom scheduling with altered filters is not supported.
Nodes do not scale up if Pods have a
PriorityClass
value below-10
. Learn more in How does Cluster Autoscaler work with Pod Priority and Preemption?.Auto-scaling for Windows node pools is not supported.
Troubleshooting
Occasionally, the cluster autoscaler cannot scale down completely and an extra node exists after scaling down. This can occur when required system Pods are scheduled onto different nodes, because there is no trigger for any of those Pods to be moved to a different node. See I have a couple of nodes with low utilization, but they are not scaled down. Why?. To work around this limitation, you can configure a Pod disruption budget.
If you are having problems with downscaling your cluster, see Pod scheduling and disruption. You might have to add a
PodDisruptionBudget
for thekube-system
Pods. For more information about manually adding aPodDisruptionBudget
for thekube-system
Pods, see the Kubernetes cluster autoscaler FAQ.When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions:
The Pod's affinity or anti-affinity rules prevent rescheduling.
The Pod has local storage.
The Pod is not managed by a controller such as a Deployment, StatefulSet, Job, or ReplicaSet.
The Pod is in kube-system namespace and does not have a PodDisruptionBudget
An application's PodDisruptionBudget might prevent autoscaling. If deleting nodes would cause the budget to be exceeded, the cluster does not scale down.
More information
For more information about cluster autoscaler and preventing disruptions, see the following questions in the Kubernetes cluster autoscaler FAQ:
- How does scale-down work?
- Does Cluster autoscaler work with PodDisruptionBudget in scale-down?
- What types of Pods can prevent Cluster autoscaler from removing a node?