Version 1.7. This version is no longer supported. For information about how to upgrade to version 1.8, see Upgrading Anthos on bare metal in the 1.8 documentation. For more information about supported and unsupported versions, see the Version history page in the latest documentation.
When a node is broken and needs to be removed from a cluster for repair or
replacement, you can force its removal from the cluster.
Force-removing worker nodes
In Google Distributed Cloud, you can add an annotation to mark a node for
force removal.
After removing the node from the parent nodepool, run the following command
to annotate the corresponding failing machine with the
baremetal.cluster.gke.io/force-remove annotation. The value of the annotation itself
does not matter:
Google Distributed Cloud removes the node successfully.
Force-removing Control Plane nodes
Force-removing a control plane node is similar to
performing a kubeadm reset on control plane nodes, and requires additional steps.
To force-remove a control plane node from the node pools, you need to take
the following actions against the cluster that contains
the failing control plane node:
remove the failing etcd member running on the failing node from the etcd cluster
update the ClusterStatus in the kube to remove the corresponding apiEndpoint.
Removing a failing etcd member
To remove the failing control plan node, first run etcdctl on the
remaining healthy etcd pods. For more general information on this operation, see
this Kubernetes documentation.
In the following procedure, CLUSTER_KUBECONFIG is the path
to the kubeconfig file of the cluster.
Look up the etcd pod with the following command:
kubectl --kubeconfig CLUSTER_KUBECONFIG get \
pod -n kube-system -l component=etcd -o wide
The command returns the following list of nodes. For this example,
assume node 10.200.0.8 is inaccessible and unrecoverable:
Look up the current members to find the ID of the failing member. The command
will return a list:
etcdctl --endpoints=https://10.200.0.6:2379,https://10.200.0.7:2379 --key=/etc/kubernetes/pki/etcd/peer.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt member list
Edit the config map to remove the section that contains the failing IP (this
example shows the results of removing 10.200.0.8 using the kubectl edit command):
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eWorker nodes can be force-removed from a Google Distributed Cloud cluster by annotating the corresponding failing machine with \u003ccode\u003ebaremetal.cluster.gke.io/force-remove\u003c/code\u003e after removing it from the nodepool.\u003c/p\u003e\n"],["\u003cp\u003eForce-removing a control plane node involves removing the failing \u003ccode\u003eetcd\u003c/code\u003e member from the \u003ccode\u003eetcd\u003c/code\u003e cluster, which can be done through executing \u003ccode\u003eetcdctl\u003c/code\u003e on remaining healthy \u003ccode\u003eetcd\u003c/code\u003e pods.\u003c/p\u003e\n"],["\u003cp\u003eTo finalize the force removal of a control plane node, you must update the \u003ccode\u003eClusterStatus\u003c/code\u003e in the \u003ccode\u003ekubeadm-config\u003c/code\u003e config map by removing the \u003ccode\u003eapiEndpoint\u003c/code\u003e associated with the failing node's IP address.\u003c/p\u003e\n"],["\u003cp\u003eThe value of the annotation used to remove worker nodes does not matter, the annotation being present will trigger the removal.\u003c/p\u003e\n"]]],[],null,["# Force-removing broken nodes in Google Distributed Cloud\n\n\u003cbr /\u003e\n\nWhen a node is broken and needs to be removed from a cluster for repair or\nreplacement, you can force its removal from the cluster.\n\nForce-removing worker nodes\n---------------------------\n\nIn Google Distributed Cloud, you can add an annotation to mark a node for\nforce removal.\n\nAfter removing the node from the parent nodepool, run the following command\nto annotate the corresponding failing machine with the\n`baremetal.cluster.gke.io/force-remove` annotation. The value of the annotation itself\ndoes not matter:\n\n```\nkubectl --kubeconfig ADMIN_KUBECONFIG -n CLUSTER_NAMESPACE \\\n annotate machine 10.200.0.8 baremetal.cluster.gke.io/force-remove=true\n```\n\nGoogle Distributed Cloud removes the node successfully.\n\nForce-removing Control Plane nodes\n----------------------------------\n\nForce-removing a control plane node is similar to\nperforming a `kubeadm reset` on control plane nodes, and requires additional steps.\n\nTo force-remove a control plane node from the node pools, you need to take\nthe following actions against the cluster that contains\nthe failing control plane node:\n\n- remove the failing `etcd` member running on the failing node from the `etcd` cluster\n- update the `ClusterStatus` in the kube to remove the corresponding `apiEndpoint`.\n\n### Removing a failing `etcd` member\n\nTo remove the failing control plan node, first run `etcdctl` on the\nremaining healthy `etcd` pods. For more general information on this operation, see\n[this Kubernetes documentation.](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#replacing-a-failed-etcd-member)\n\nIn the following procedure, \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e is the path\nto the `kubeconfig` file of the cluster.\n\n1. Look up the `etcd` pod with the following command:\n\n ```\n kubectl --kubeconfig CLUSTER_KUBECONFIG get \\\n pod -n kube-system -l component=etcd -o wide\n ```\n\n The command returns the following list of nodes. For this example,\n assume node **10.200.0.8** is inaccessible and unrecoverable: \n\n ```javascript\n NAME READY STATUS RESTARTS AGE IP NODE\n etcd-357b68f4ecf0 1/1 Running 0 9m2s 10.200.0.6 357b68f4ecf0\n etcd-7d7c21db88b3 1/1 Running 0 33m 10.200.0.7 7d7c21db88b3\n etcd-b049141e0802 1/1 Running 0 8m22s 10.200.0.8 b049141e0802\n ```\n\n \u003cbr /\u003e\n\n2. Exec into one of the remaining healthy `etcd` pods:\n\n ```\n kubectl --kubeconfig CLUSTER_KUBECONFIG exec -it -n \\\n kube-system etcd-357b68f4ecf0 -- /bin/sh\n ```\n3. Look up the current members to find the ID of the failing member. The command\n will return a list:\n\n ```\n etcdctl --endpoints=https://10.200.0.6:2379,https://10.200.0.7:2379 --key=/etc/kubernetes/pki/etcd/peer.key \\\n --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt member list\n ```\n\n This command returns, for example: \n\n ```javascript\n 23da9c3f2594532a, started, 7d7c21db88b3, https://10.200.0.6:2380, https://10.200.0.6:2379, false\n 772c1a54956b7f51, started, 357b68f4ecf0, https://10.200.0.7:2380, https://10.200.0.7:2379, false\n f64f66ad8d3e7960, started, b049141e0802, https://10.200.0.8:2380, https://10.200.0.8:2379, false\n ```\n4. Remove the failing member:\n\n ```\n etcdctl --endpoints=https://10.200.0.6:2379,https://10.200.0.7:2379 --key=/etc/kubernetes/pki/etcd/peer.key \\\n --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt \\\n member remove f64f66ad8d3e7960\n ```\n\n### Updating `ClusterStatus` and removing the failing `apiEndpoint`\n\nIn the following procedure, \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e is the path to the\n`kubeconfig` file of the cluster.\n\n1. Look up the `ClusterStatus` section inside the `kubeadm-config` config map:\n\n ```\n kubectl --kubeconfig CLUSTER_KUBECONFIG describe configmap -n \\\n kube-system kubeadm-config\n ```\n\n The command returns results similar to those shown below: \n\n ```javascript\n ...\n ClusterStatus:\n ----\n apiEndpoints:\n 7d7c21db88b3:\n advertiseAddress: 10.200.0.6\n bindPort: 6444\n 357b68f4ecf0:\n advertiseAddress: 10.200.0.7\n bindPort: 6444\n b049141e0802:\n advertiseAddress: 10.200.0.8\n bindPort: 6444\n apiVersion: kubeadm.k8s.io/v1beta2\n kind: ClusterStatus\n ...\n ```\n2. Edit the config map to remove the section that contains the failing IP (this\n example shows the results of removing `10.200.0.8` using the `kubectl edit` command):\n\n ```\n kubectl --kubeconfig CLUSTER_KUBECONFIG edit configmap \\\n -n kube-system kubeadm-config\n ```\n\n After editing, the config map looks similar to the following: \n\n ```javascript\n ...\n ClusterStatus: |\n apiEndpoints:\n 7d7c21db88b3:\n advertiseAddress: 10.200.0.6\n bindPort: 6444\n 357b68f4ecf0:\n advertiseAddress: 10.200.0.7\n bindPort: 6444\n apiVersion: kubeadm.k8s.io/v1beta2\n kind: ClusterStatus\n ...\n ```\n3. When you save the edited config map, the failing node is removed from the cluster."]]