Reset a failed node in GKE on Bare Metal

When nodes in GKE on Bare Metal fail, which can happen because of issues with storage, network, or OS misconfiguration, you want to efficiently restore cluster health. After you restore the cluster health, you can troubleshoot the node failure. This document shows you how to recover from node failure scenarios by resetting a node, and forcefully removing the node if needed.

If you want to add or remove nodes from a cluster when a node hasn't failed, see Update clusters.

If you need additional assistance, reach out to Cloud Customer Care.

Reset nodes

When there's a node failure, sometimes you can't run reset commands on the nodes as the node might be unreachable. You might need to forcefully remove the node from the cluster.

When you cleanly reset a node and update the cluster, the following actions happen:

  1. The node resets, similar to kubeadm reset, and the machine reverts to the pre-installed state.
  2. The related references to the node are removed from the nodepool and cluster custom resources.

In some of the following bmctl commands to reset nodes, the --force parameter indicates whether the reset commands (step 1) should be skipped. If the --force parameter is used, bmctl only performs the removal step (step 2), and doesn't run the reset commands.

Remove worker node

To remove a worker node from a cluster, complete the following steps:

  1. Try to cleanly reset the node. After the node is reset, the node is removed from the cluster:

    bmctl reset nodes \
        --addresses COMMA_SEPARATED_IPS \
        --cluster CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    If this command succeeds, you can now diagnose the node and fix any misconfigurations that caused the initial failure. Skip the remaining steps in this section.

  2. If the previous step to reset the node fails, forcefully remove the node from the cluster. This forceful removal skips the previous step that runs that reset commands and only performs the step to remove the related references to the node from the nodepool and cluster custom resources:

    bmctl reset nodes \
        --addresses COMMA_SEPARATED_IPS \
        --cluster CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG \
        --force
    

    You can now diagnose the node and fix any misconfigurations that caused the initial failure.

  3. If you forcefully removed the node from the node cluster in the previous step, run the bmctl reset command again to reset the nodes:

    bmctl reset nodes \
        --addresses COMMA_SEPARATED_IPS \
        --cluster CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG
    

Remove single control plane node

The process is the same as for worker nodes. For control plane nodes, bmctl also cleans the etcd membership.

The cluster stops being in a highly available (HA) state after you remove the failed node. To return to a HA state, add a healthy node to the cluster.

To remove a node from a cluster, complete the following steps:

  1. Try to cleanly reset the node. After the node is reset, the node is removed from the cluster:

    bmctl reset nodes \
        --addresses COMMA_SEPARATED_IPS \
        --cluster CLUSTER_NAME \
        --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following values:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    If this command succeeds, you can now diagnose the node and fix any misconfigurations that caused the initial failure. Skip the remaining steps in this section.

  2. If the previous step to reset the node fails, you can forcefully remove the node from the cluster. This forceful removal skips the previous step that runs that reset commands, and only performs the step to remove the related references to the node from the nodepool and cluster custom resources:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG \
      --force
    

    You can now diagnose the node and fix any misconfigurations that caused the initial failure.

  3. If you forcefully removed the node from the node cluster in the previous step, run the bmctl reset command again to reset the nodes:

    bmctl reset nodes \
      --addresses COMMA_SEPARATED_IPS \
      --cluster CLUSTER_NAME \
      --kubeconfig ADMIN_KUBECONFIG
    

Quorum lost in HA control plane

If too many control planes nodes in an HA cluster enter a failed state, the cluster loses quorum and becomes unavailable.

When you need to restore management clusters, don't provide the kubeconfig file in the reset commands. If you provide the kubeconfig file for a management cluster, it forces a new cluster to perform the reset operation. When you restore a user cluster, provide the path to the kubeconfig file.

  1. To recover a cluster that has lost quorum, run the following command on a remaining healthy node:

    bmctl restore --control-plane-node CONTROL_PLANE_NODE \
        --cluster CLUSTER_NAME \
        [--kubeconfig KUBECONFIG_FILE]
    

    Replace the following:

    • CONTROL_PLANE_NODE: the IP addresses of a healthy node that remains as part of the cluster.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • KUBECONFIG_FILE: if recovering a user cluster, the path to the user cluster kubeconfig file.
  2. After you recover the failed nodes, run the bmctl reset command to reset the nodes:

    bmctl reset nodes \
       --addresses COMMA_SEPARATED_IPS \
       --cluster CLUSTER_NAME \
       [--kubeconfig KUBECONFIG_FILE]
    

    Replace the following:

    • COMMA_SEPARATED_IP: the IP addresses of the nodes to reset, such as 10.200.0.8,10.200.0.9.
    • CLUSTER_NAME: the name of the target cluster that contains the failed nodes.
    • KUBECONFIG_FILE: the path to the admin cluster kubeconfig file.

    If the failed nodes were part of the load balancer node pools, after the nodes recover they might contend for the control plane virtual IP address and make the new cluster unstable. Run the reset commands against the failed nodes as soon as possible after you recover the nodes.

This process only handles the disaster recovery for a 3-node control plane HA deployment. This process doesn't support the recovery for HA setups with 5 nodes or more.

What's next

  • For more information on how to add or remove nodes from a cluster when there isn't a failure and check the node status, see Update clusters.

  • If you need additional assistance, reach out to Cloud Customer Care.