When you need to repair or maintain nodes, you should first put the nodes into maintenance mode. Putting nodes into maintenance mode safely drains their pods/workloads and excludes the nodes from pod scheduling. In maintenance mode, you can work on your nodes without a risk of disrupting pod traffic.
How it works
Maintenance mode for Anthos clusters on bare metal is similar to running
kubectl cordon
and kubectl drain
for a specific node. Here are a few details
that are relevant to maintenance mode:
Specified nodes are marked as unschedulable (
node.spec.unschedulable
istrue
), which is whatkubectl cordon
does.Node taints are added to specified nodes to indicate that no pods can be scheduled or executed on the nodes. This action is similar to
kubectl drain
, butkubectl drain
uses the Eviction API. to terminate pods running on the node.A 20-minute timeout is enforced to ensure nodes don't get stuck waiting for pods to terminate. Pods might not terminate if they are configured to tolerate all taints or they have finalizers. Anthos clusters on bare metal attempts to terminate all pods, but if the timeout is exceeded, the node is put into maintenance mode. This timeout prevents running pods from blocking upgrades.
If you have a VM-based workload running on the node, Anthos clusters on bare metal applies a
NodeSelector
to the virtual machine instance (VMI) Pod, then stops the Pod. TheNodeSelector
ensures that the VMI Pod is restarted on the same node when the node is removed from maintenance mode.
Put a node into maintenance mode
Choose the nodes you want to put into maintenance mode by specifying IP ranges
for the selected nodes under maintenanceBlocks
in your cluster configuration
file. The nodes you choose must be in a ready state, and functioning in the
cluster.
To put nodes into maintenance mode:
Edit the cluster configuration file to select the nodes you want to put into maintenance mode.
You can edit the configuration file with an editor of your choice, or you can edit the cluster custom resource directly by running the following command:
kubectl -n CLUSTER_NAMESPACE edit cluster CLUSTER_NAME
Replace the following:
CLUSTER_NAMESPACE
: the namespace of the cluster.CLUSTER_NAME
: the name of the cluster.
Add the
maintenanceBlocks
section to the cluster configuration file to specify either a single IP address, or an address range, for nodes you want to put into maintenance mode.The following sample shows how to select multiple nodes by specifying a range of IP addresses:
metadata: name: my-cluster namespace: cluster-my-cluster spec: maintenanceBlocks: cidrBlocks: - 172.16.128.1-172.16.128.64
Save and apply the updated cluster configuration.
Anthos clusters on bare metal starts putting the nodes into maintenance mode.
Run the following command to get the status of the nodes in your cluster:
kubectl get nodes -n CLUSTER_NAME
The response is something like the following:
NAME STATUS ROLES AGE VERSION user-anthos-baremetal-01 Ready master 2d22h v1.17.8-gke.16 user-anthos-baremetal-04 Ready <none> 2d22h v1.17.8-gke.16 user-anthos-baremetal-05 Ready,SchedulingDisabled <none> 2d22h v1.17.8-gke.16 user-anthos-baremetal-06 Ready <none> 2d22h v1.17.8-gke.16
A status of
SchedulingDisabled
indicates that a node is in maintenance mode.Run the following command to get the number of nodes in maintenance mode:
kubectl get nodepools
The response should look something like the following output:
NAME READY RECONCILING STALLED UNDERMAINTENANCE UNKNOWN np1 3 0 0 1 0
This
UNDERMAINTENANCE
column in this sample shows that one node is in maintenance mode.Anthos clusters on bare metal also adds the following taints to nodes when they are put into maintenance mode:
baremetal.cluster.gke.io/maintenance:NoExecute
baremetal.cluster.gke.io/maintenance:NoSchedule
Remove a node from maintenance mode
To remove nodes from maintenance mode:
Edit the cluster configuration file to clear the nodes you want to remove from maintenance mode.
You can edit the configuration file with an editor of your choice, or you can edit the cluster custom resource directly by running the following command:
kubectl -n CLUSTER_NAMESPACE edit cluster CLUSTER_NAME
Replace the following:
CLUSTER_NAMESPACE
: the namespace of the cluster.CLUSTER_NAME
: the name of the cluster.
Either edit the IP addresses to remove specific nodes from maintenance mode or remove the
maintenanceBlocks
section remove all does from maintenance mode.Save and apply the updated cluster configuration.
Use
kubectl
commands to check the status of your nodes.