Put nodes into maintenance mode

When you need to repair or maintain nodes, you should first put the nodes into maintenance mode. This gracefully drains existing pods and workloads, excluding critical system pods like the API server. Maintenance mode also prevents the node from receiving new pod assignments. In maintenance mode, you can work on your nodes without a risk of disrupting pod traffic.

How it works

GKE on Bare Metal provides a way to place nodes into maintenance mode. This approach lets other cluster components correctly know that the node is in maintenance mode. When you place a node in maintenance mode, no additional pods can be scheduled on the node, and existing pods are stopped.

Instead of using maintenance mode, you can manually use Kubernetes commands such as kubectl cordon and kubectl drain on a specific node. If you run GKE on Bare Metal version 1.12.0 (anthosBareMetalVersion: 1.12.0) or lower, see the known issue on Nodes uncordoned if you don't use the maintenance mode procedure.

When you use the maintenance mode process, GKE on Bare Metal does the following:

  • Node taints are added to specified nodes to indicate that no pods can be scheduled or executed on the nodes.

  • A 20-minute timeout is enforced to ensure nodes don't get stuck waiting for pods to stop. Pods might not stop if they are configured to tolerate all taints or they have finalizers. GKE on Bare Metal attempts to stop all pods, but if the timeout is exceeded, the node is put into maintenance mode. This timeout prevents running pods from blocking upgrades.

Put a node into maintenance mode

Choose the nodes you want to put into maintenance mode by specifying IP ranges for the selected nodes under maintenanceBlocks in your cluster configuration file. The nodes you choose must be in a ready state, and functioning in the cluster.

To put nodes into maintenance mode:

  1. Edit the cluster configuration file to select the nodes you want to put into maintenance mode.

    You can edit the configuration file with an editor of your choice, or you can edit the cluster custom resource directly by running the following command:

    kubectl -n CLUSTER_NAMESPACE edit cluster CLUSTER_NAME
    

    Replace the following:

    • CLUSTER_NAMESPACE: the namespace of the cluster.
    • CLUSTER_NAME: the name of the cluster.
  2. Add the maintenanceBlocks section to the cluster configuration file to specify either a single IP address, or an address range, for nodes you want to put into maintenance mode.

    The following sample shows how to select multiple nodes by specifying a range of IP addresses:

    metadata:
      name: my-cluster
      namespace: cluster-my-cluster
    spec:
      maintenanceBlocks:
        cidrBlocks:
        - 172.16.128.1-172.16.128.64
    
  3. Save and apply the updated cluster configuration.

    GKE on Bare Metal starts putting the nodes into maintenance mode.

  4. Run the following command to get the status of the nodes in your cluster:

    kubectl get nodes --kubeconfig=KUBECONFIG
    

    The response is something like the following:

    NAME                       STATUS   ROLES           AGE     VERSION
    user-anthos-baremetal-01   Ready    control-plane   2d22h   v1.27.4-gke.1600
    user-anthos-baremetal-04   Ready    worker          2d22h   v1.27.4-gke.1600
    user-anthos-baremetal-05   Ready    worker          2d22h   v1.27.4-gke.1600
    user-anthos-baremetal-06   Ready    worker          2d22h   v1.27.4-gke.1600
    

    Note that the nodes are still schedulable, but taints keep any pods (without an appropriate toleration) from being scheduled on the node.

  5. Run the following command to get the number of nodes in maintenance mode:

    kubectl get nodepools --kubeconfig ADMIN_KUBECONFIG 
    

    The response should look something like the following example:

    NAME   READY   RECONCILING   STALLED   UNDERMAINTENANCE   UNKNOWN
    np1    3       0             0         1                  0
    

    This UNDERMAINTENANCE column in this sample shows that one node is in maintenance mode.

    GKE on Bare Metal also adds the following taints to nodes when they are put into maintenance mode:

    • baremetal.cluster.gke.io/maintenance:NoExecute
    • baremetal.cluster.gke.io/maintenance:NoSchedule

Billing and maintenance mode

Billing for GKE on Bare Metal is based on the number of vCPUs your cluster has for Nodes capable of running workloads. When you put a Node into maintenance mode, NoExecute and NoSchedule taints are added to the Node, but they don't disable billing. After putting a node into maintenance mode, cordon the node (kubectl cordon NODE_NAME) to mark it as unschedulable. Once a node is marked as unschedulable, the Node and its associated vCPUs are excluded from billing.

As described on the pricing page, you can use kubectl to see the vCPU capacity (used for Anthos billing) of each of your user clusters. The command doesn't take whether or not the Node is schedulable into consideration, it provides a vCPU count per node only.

To identify the number of vCPUs per node for your user cluster:

kubectl get nodes \
    --kubeconfig USER_KUBECONFIG \
    -o=jsonpath="{range .items[*]}{.metadata.name}{\"\t\"} \
    {.status.capacity.cpu}{\"\n\"}{end}"

Replace USER_KUBECONFIG with the path of the kubeconfig file for your user cluster.

Remove a node from maintenance mode

To remove nodes from maintenance mode:

  1. Edit the cluster configuration file to clear the nodes you want to remove from maintenance mode.

    You can edit the configuration file with an editor of your choice, or you can edit the cluster custom resource directly by running the following command:

    kubectl -n CLUSTER_NAMESPACE edit cluster CLUSTER_NAME
    

    Replace the following:

    • CLUSTER_NAMESPACE: the namespace of the cluster.
    • CLUSTER_NAME: the name of the cluster.
  2. Either edit the IP addresses to remove specific nodes from maintenance mode or remove the maintenanceBlocks section remove all does from maintenance mode.

  3. Save and apply the updated cluster configuration.

  4. Use kubectl commands to check the status of your nodes.