Reset cluster nodes

When a cluster is in the process of being installed, binaries and systemd services are installed on the nodes hosting that cluster, and services begin to listen to ports on the nodes.

However, if a cluster installation fails, all these binaries and services need to be deleted. In other words, the nodes need to be reset or wiped clean to prepare them for a reattempt at installing the cluster. If nodes aren't reset in this way, the next attempt to install a cluster on them fails.

The bmctl reset command performs this clean up operation of nodes. You can run the bmctl reset command on an entire cluster or on specific nodes of a cluster, and this document explains how to run the command in both modes.

It's important to note that when the bmctl reset command is applied to a cluster, that cluster will be deleted because the command wipes nodes clean of all cluster binaries and services.

Reset clusters with bmctl reset cluster

Resetting a cluster causes it to be deleted. Once the cluster has been deleted, you can reinstall it after making any needed configuration changes.

Reset self-managed clusters

To reset admin, hybrid, or standalone clusters, run the following command:

bmctl reset --cluster CLUSTER_NAME

In the command, replace CLUSTER_NAME with the name of the cluster you want to reset.

When the cluster reset is complete, you can create a new cluster. For details, see Cluster creation overview.

Reset user clusters

You can reset or delete user clusters with the bmctl reset command or with the kubectl delete command. In both cases, the cluster is deleted. We recommend that you use bmctl reset.

Using bmctl to reset/delete a user cluster

Run the following command to reset/delete a user cluster with bmctl:

bmctl reset --cluster CLUSTER_NAME --admin-kubeconfig ADMIN_KUBECONFIG_PATH

In the command, replace the following entries with information specific to your environment:

  • CLUSTER_NAME: the name of the user cluster you're resetting.

  • ADMIN_KUBECONFIG_PATH: the path to the associated admin cluster's kubeconfig file. bmctl supports the use of --kubeconfig as an alias for the--admin-kubeconfig flag.

Using kubectl to delete a user cluster

To use kubectl to delete a user cluster, you must first delete the cluster object, then its namespace. Otherwise, the jobs to reset machines can't be created, and the deletion process might be stuck indefinitely.

To delete a user cluster with kubectl:

  1. Run the following command to delete the cluster object:

    kubectl delete cluster CLUSTER_NAME -n CLUSTER_NAMESPACE \
        --kubeconfig ADMIN_KUBECONFIG_PATH
    

    In the command, replace the following entries with information specific to your environment:

    • CLUSTER_NAME: the name of the user cluster you're deleting.

    • CLUSTER_NAMESPACE: the namespace for the cluster. By default, the cluster namespaces for GKE on Bare Metal are the name of the cluster prefaced with cluster-. For example, if you name your cluster test, the namespace has a name like cluster-test.

    • ADMIN_KUBECONFIG_PATH: the path to the associated admin cluster's kubeconfig file.

  2. After the cluster is deleted successfully, run the following command to delete the namespace:

    kubectl delete namespace CLUSTER_NAMESPACE --kubeconfig ADMIN_KUBECONFIG_PATH
    

Reset specific cluster nodes

You might want to reset specific nodes of a cluster if, for example, an admin cluster has been deleted but the user clusters managed by that admin cluster remain. In this case, the user clusters as a whole can't be deleted because the admin cluster has been deleted. Consequently, the nodes of the user clusters have to be individually reset.

Reset nodes using the GCR service account's JSON key

To reset individual nodes of a cluster, run the following command:

bmctl reset nodes --addresses NODE_1_IP_ADDRESS,NODE_2_IP_ADDRESS \
    --ssh-private-key-path SSH_KEY_PATH \
    --gcr-service-account-key SERVICE_ACCOUNT_KEY_PATH \
    --login-user root

In the command, replace the following entries with information specific to your environment:

  • NODE_1_IP_ADDRESS , NODE_2_IP_ADDRESS: comma-separated list of IP addresses of nodes you want to delete.

  • SSH_KEY_PATH: path to SSH private key.

  • SERVICE_ACCOUNT_KEY_PATH: path to the JSON file that contains the service account key. This key gives bmctl permission to pull images from the Google Container Registry (GCR). You can create a service account key using the Google Cloud console, the gcloud CLI, the serviceAccounts.keys.create() method, or one of the client libraries. For details, see Creating and managing service account keys. Another way the service account key file can be created is if you run the create config command with the --create-service-accounts flag. For details about that command, see Create and admin cluster config with bmctl.

Reset cluster details

Output from the bmctl cluster reset command looks similar to this sample:

bmctl reset --cluster cluster1
Creating bootstrap cluster... OK
Deleting GKE Hub member admin in project my-gcp-project...
Successfully deleted GKE Hub member admin in project my-gcp-project
Loading images... OK
Starting reset jobs...
Resetting: 1    Completed: 0    Failed: 0
...
Resetting: 0    Completed: 1    Failed: 0
Flushing logs... OK

During the reset operation, bmctl first attempts to delete the GKE hub membership registration, and then cleans up the affected nodes. During the reset, storage mounts and data from the anthos-system StorageClass are also deleted.

For all nodes, bmctl runs kubeadm reset, removes the tunnel interfaces used for cluster networking, and deletes the following directories:

  • /etc/kubernetes
  • /etc/cni/net.d
  • /root/.kube
  • /var/lib/kubelet

For load balancer Nodes, bmctl also performs the following actions:

  • Disables keepalived and haproxy services.
  • Deletes the configuration files for keepalived and haproxy.

The bmctl reset command expects the cluster configuration file to be in the current working directory. By default, the path is like the following: bmctl-workspace/CLUSTER_NAME/CLUSTER_NAME.yaml

If you used the --workspace-dir flag to specify a different directory during cluster creation, you must use the flag to specify the working directory during cluster reset.