Reset a failed node in GKE on Bare Metal

If nodes in GKE on Bare Metal fail, such as due to storage, network, or OS misconfiguration, you want to efficiently restore cluster health. After you restore the cluster health, you can troubleshoot the node failure.

This document shows you how to recover from node failure scenarios by resetting a node, and forcefully removing the node if needed.

If you want to add or remove nodes from a cluster under normal circumstances when a node hasn't failed, see Update clusters.

Overview

When there's a node failure, sometimes you can't run reset commands on the nodes as the node might be unreachable. You might need to forcefully remove the node from the cluster.

When you cleanly reset a node and update the cluster, the following actions happen:

  1. The node resets, similar to kubeadm reset, and the machine reverts to the pre-installed state.
  2. The related references to the node are removed from the nodepool and cluster custom resources.

Worker node

To remove a node from a cluster, first try to cleanly remove it:

  1. Try to cleanly reset the node. After the node is reset, the node is removed from the cluster:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following values:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    You can now diagnose the node and fix any misconfigurations that caused the initial failure. Skip the remaining steps in this section.

  2. If the previous step to reset the node fails, you can forcefully remove the node from the cluster. This forceful removal skips the previous step that runs that reset commands, and only performs the step to remove the related references to the node from the nodepool and cluster custom resources:

    bmctl reset nodes \
     --addresses COMMA_SEPARATED_IPS \
     --cluster CLUSTER_NAME \
     --kubeconfig ADMIN_KUBECONFIG \
     --force
    

    You can now diagnose the node and fix any misconfigurations that caused the initial failure.

  3. If you forcefully removed the node from the node cluster in the previous step, run the bmctl reset command again to reset the nodes:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG
    

Single control plane node failure

The process is the same as for worker nodes. For control plane nodes, bmctl also cleans the etcd membership.

To remove a node from a cluster, first try to cleanly remove it:

  1. Try to cleanly reset the node. After the node is reset, the node is removed from the cluster:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following values:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    You can now diagnose the node and fix any misconfigurations that caused the initial failure. Skip the remaining steps in this section.

  2. If the previous step to reset the node fails, you can forcefully remove the node from the cluster. This forceful removal skips the previous step that runs that reset commands, and only performs the step to remove the related references to the node from the nodepool and cluster custom resources:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG \
      --force
    

    You can now diagnose the node and fix any misconfigurations that caused the initial failure.

  3. If you forcefully removed the node from the node cluster in the previous step, run the bmctl reset command again to reset the nodes:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG
    

Quorum lost in HA control plane

If too many control planes nodes in an HA cluster enter a failed state, the cluster loses quorum and becomes unavailable.

  1. To recover a cluster that has lost quorum, run the following command on a remaining healthy node:

    bmctl restore --control-plane-node CONTROL_PLANE_NODE \
      --cluster CLUSTER_NAME \
      [--kubeconfig KUBECONFIG_FILE]
    

    Replace the following values:

    • CONTROL_PLANE_NODE: the IP addresses of a healthy node that remains as part of the cluster.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • KUBECONFIG_FILE: if recovering a user cluster, the path to the user cluster kubeconfig file.
  2. After you recover the failed nodes, run the bmctl reset command to reset the nodes:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      [--kubeconfig KUBECONFIG_FILE]
    

    Replace the following values:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • KUBECONFIG_FILE: the path to the admin cluster kubeconfig file.

    If the failed nodes were part of the load balancer nodepools, after the nodes recover they might contend for the control plane virtual IP address and make the new cluster unstable. Run the reset commands against the failed nodes as soon as possible after you recover the nodes.

What's next

For more information on how to add or remove nodes from a cluster when there isn't a failure and check the node status, see Update clusters.