Removing nodes for maintenance

When you need to repair or maintain nodes, you can remove them and their workloads from your Anthos on bare metal clusters. Maintenance mode safely removes nodes from the cluster along with their workloads, so you can work on them before you restore them to the cluster.

Choose nodes you want to put into maintenance mode by specifying IP ranges for the selected nodes in your cluster config file. After you've updated the config file, Anthos on bare metal drains the nodes of their workload, and safely removes them from the cluster. The nodes you choose must be in a ready state, and functioning in the cluster.

To put nodes in maintenance mode:

  1. Edit the cluster config file to select the nodes you want to put in maintenance mode. You can do this with an editor of your choice, or by issuing the following command:
  2. kubectl -n CLUSTER_NAMESPACE edit cluster CLUSTER_NAME
    
  3. Add the maintenanceBlocks entry to the cluster config file to specify either a single IP address, or an address range, for nodes you want to put into maintenance mode. A sample entry is shown below (note that the IP range is shown as a sample only):
  4.   metadata:
        name: my-cluster
        namespace: my-namespace
      spec:
        maintenanceBlocks:
          cidrBlocks:
          - 172.16.128.1-172.16.128.64
    

    Once you update the cluster config, Anthos on bare metal starts putting the nodes into maintenance mode.

  5. Check to see the status of the nodes you put into maintenance mode with the kubectl get nodes command:
  6. kubectl get nodes -n CLUSTER_NAME

    Listing the nodes in the cluster shows the selected nodes with SchedulingDisabled status, indicating they are being put into maintenance mode. You should see a listing similar to the example listing shown below:

    NAME                              STATUS                     ROLES    AGE     VERSION
    user-anthos-baremetal-01          Ready                      master   2d22h   v1.17.8-gke.16
    user-anthos-baremetal-04          Ready                      <none>   2d22h   v1.17.8-gke.16
    user-anthos-baremetal-05          Ready,SchedulingDisabled   <none>   2d22h   v1.17.8-gke.16
    user-anthos-baremetal-06          Ready                      <none>   2d22h   v1.17.8-gke.16
    
  7. To show the number of nodes in maintenance mode, issue the kubectl get nodepools command on the cluster. The results are similar to those shown below:
  8. NAME   READY   RECONCILING   STALLED   UNDERMAINTENANCE   UNKNOWN
    np1    3       0             0         1                  0
    

In addition to the UNDERMAINTENANCE results from kubectl get nodepools, you will see the following taints on the node in maintenance mode: baremetal.cluster.gke.io/maintenance:NoExecute and baremetal.cluster.gke.io/maintenance:NoSchedule.