Replacing a failed etcd replica

This page describes how to replace a failed etcd replica in a high availability (HA) user cluster for GKE on VMware.

Before you begin

  • Make sure the admin cluster is working correctly.

  • Make sure the other two etcd members in the user cluster are working correctly. If more than one etcd member has failed, see Recovery from etcd data corruption or loss.

Replacing a failed etcd replica

  1. Back up a copy of the etcd PodDisruptionBudget (PDB) so you can restore it later.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME get pdb kube-etcd-pdb -o yaml > /path/to/etcdpdb.yaml

    Where:

    • ADMIN_CLUSTER_KUBECONFIG is the path to the kubeconfig file for the admin cluster.

    • USER_CLUSTER_NAME is the name of the user cluster that contains the failed etcd replica.

  2. Delete the etcd PodDisruptionBudget (PDB).

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME delete pdb kube-etcd-pdb
  3. Run the following command to open the kube-etcd StatefulSet in your text editor:

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd

    Change the value of the --initial-cluster-state flag to existing.

    containers:
        - name: kube-etcd
          ...
          args:
            - --initial-cluster-state=existing
          ...
     
  4. Drain the failed etcd replica node.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG drain NODE_NAME --ignore-daemonsets --delete-local-data

    Where NODE_NAME is the name of the failed etcd replica node.

  5. Create a new shell in the container of one of the working kube-etcd pods.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it \
       KUBE_ETCD_POD --container kube-etcd --namespace USER_CLUSTER_NAME \
       -- bin/sh

    Where KUBE_ETCD_POD is the name of the working kube-etcd pod. For example, kube-etcd-0.

    From this new shell, run the following commands:

    1. Remove the failed etcd replica node from the etcd cluster.

      ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etcd.local.config/certificates/etcdCA.crt --cert=/etcd.local.config/certificates/etcd.crt --key=/etcd.local.config/certificates/etcd.key --endpoints=https://127.0.0.1:2379 member remove MEMBER_ID

      Where MEMBER_ID is the ID of the failed etcd replica node. To get the ID, run the following command:

      ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etcd.local.config/certificates/etcdCA.crt --cert=/etcd.local.config/certificates/etcd.crt --key=/etcd.local.config/certificates/etcd.key member list -w fields

      The previous command displays all the members of the etcd cluster. The output is similar to the following:

      sh-5.0# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etcd.local.config/certificates/etcd.key member list -w fields
      
      "ClusterID" : 6963206042588294154
      "MemberID" : 4645269864592341793
      "Revision" : 0
      "RaftTerm" : 15
      "ID" : 2279696924967222455
      "Name" : "kube-etcd-2"
      "PeerURL" : "https://kube-etcd-2.kube-etcd:2380"
      "ClientURL" : "https://kube-etcd-2.kube-etcd:2379"
      "IsLearner" : false
      
      "ID" : 3728561467092418843
      "Name" : "kube-etcd-1"
      "PeerURL" : "https://kube-etcd-1.kube-etcd:2380"
      "ClientURL" : "https://kube-etcd-1.kube-etcd:2379"
      "IsLearner" : false
      
      "ID" : 4645269864592341793
      "Name" : "kube-etcd-0"
      "PeerURL" : "https://kube-etcd-0.kube-etcd:2380"
      "ClientURL" : "https://kube-etcd-0.kube-etcd:2379"
      "IsLearner" : false
      
      sh-5.0#
      

      The MemberID in the preceding output is the member ID of the working kube-etcd Pod. Next, get the ID of the failed etcd replica node. In the preceding example kube-etcd-0 has an ID of 4645269864592341793, kube-etcd-1 has an ID of 3728561467092418843 and kube-etcd-2 has a ID of 2279696924967222455.

      After you have the member ID, convert it from decimal to hex, because the member remove command accepts a hex member ID, while member list returns a decimal. You can use printf to do the conversion. In this example for kube-etcd-2 it will be:

      printf '%x\n' 2279696924967222455
      

      The output of the preceding command is the MEMBER_ID you need to use for the member remove command.

    2. Add a new member with the same name and peer URL as the failed replica node.

      ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etcd.local.config/certificates/etcdCA.crt --cert=/etcd.local.config/certificates/etcd.crt --key=/etcd.local.config/certificates/etcd.key member add MEMBER_NAME --peer-urls=https://MEMBER_NAME.kube-etcd:2380

      Where MEMBER_NAME is the identifier of the failed kube-etcd replica node. For example, kube-etcd-1 or kube-etcd2.

  6. Follow steps 1-3 of Deploying the utility Pods to create a utility Pod in the admin cluster. This Pod is used to access the PersistentVolume (PV) of the failed etcd member in the user cluster.

  7. Clean up the etcd data directory from within the utility Pod.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG exec -it -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER -- bash -c 'rm -rf /var/lib/etcd/*'
  8. Delete the utility Pod.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG delete pod -n USER_CLUSTER_NAME etcd-utility-MEMBER_NUMBER
  9. Uncordon the failed node.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG uncordon NODE_NAME
  10. Open the kube-etcd StatefulSet in your text editor.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG -n USER_CLUSTER_NAME edit statefulset kube-etcd

    Change the value of the --initial-cluster-state flag to new.

    containers:
        - name: kube-etcd
          ...
          args:
            - --initial-cluster-state=new
          ...
     
  11. Restore the etcd PDB which was deleted in step 1.

    kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG apply -f /path/to/etcdpdb.yaml