This document shows how to backup and restore the etcd store for an admin
cluster created with Google Distributed Cloud (software only) for VMware. This document
also provides a script that you can use to
automatically back up a cluster's etcd store. You can also
back up and restore
an admin cluster using the gkectlcommand-line tool.
You can create a backup file for recovery from unexpected disasters that might damage your cluster's etcd data. Store the backup file in a location that is outside of the cluster and isn't dependent on the cluster's operation.
Limitations
The backup and restore procedure described in this document has the following limitations:
- This procedure doesn't back up application-specific data. 
- This procedure doesn't back up your PersistentVolumes. 
- Workloads scheduled after you create a backup aren't restored with that backup. 
- You can't restore a cluster after a failed upgrade. 
- This procedure isn't intended to restore a deleted cluster. 
- Don't use this procedure for clusters with advanced cluster enabled. Instead, refer to Back up and restore advanced clusters with gkectl. 
For more information about limitations, see Infrastructure incompatibility.
Backing up an admin cluster
An admin cluster backup contains the following:
- A snapshot of the admin cluster's etcd.
- Admin control plane's Secrets, which are required for authenticating to the admin and user clusters.
Complete the following steps before you create an admin cluster backup:
- Find the admin cluster's external IP address, which is used to SSH in to the admin cluster control plane: - kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] get nodes -n kube-system -o wide | grep master - where [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file. 
- Create an SSH key called - vsphere_tmpfrom the admin cluster's private key.- You can find the private key from the admin clusters Secrets: - kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] get secrets sshkeys -n kube-system -o yaml - In the command output, you can find the private key in the - vsphere_tmpfield.- Copy the private key to - vsphere_tmp:- echo "[PRIVATE_KEY]" | base64 -d > vsphere_tmp; chmod 600 vsphere_tmp 
- Check that you can shell into the admin control plane using this private key: - ssh -i vsphere_tmp ubuntu@[EXTERNAL_IP] 
- Exit the container: - exit 
Backing up an admin cluster's etcd store
To back up the admin cluster's etcd store:
- Get the etcd Pod's name: - kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] get pods \ -n kube-system -l component=etcd,tier=control-plane -ojsonpath='{$.items[*].metadata.name}{"\n"}'
- Shell into Pod's kube-etcd container: - kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] exec -it \ -n kube-system [ADMIN_ETCD_POD] -- bin/sh- where [ADMIN_ETCD_POD] is the name of the etcd Pod. 
- From the shell, use - etcdctlto create a backup named- snapshot.dbin the local directory:- ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save /tmp/snapshot.db
- Exit the container: - exit 
- Copy the backup out of the kube-etcd container using - kubectl cp:- kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] cp \ kube-system/[ADMIN_ETCD_POD]:tmp/snapshot.db [RELATIVE_DIRECTORY] - where [RELATIVE_DIRECTORY] is a path where you want to store your backup. 
Backing up an admin cluster's Secrets
To back up the admin control plane's Secrets:
- Use SSH to connect to the admin control plane node: - ssh -i vsphere_tmp ubuntu@EXTERNAL_IP - Replace - EXTERNAL_IPwith the admin control plane's external IP address, which you noted previously.
- Optional but highly recommended: Create a local backup directory. - You need to change the backup Secrets' permissions to copy them out of the node. - mkdir backup 
- Locally copy the Secrets to the local backup directory: - sudo cp -r /etc/kubernetes/pki/* backup/ 
- Change the permissions of the backup Secrets: - sudo chmod -R a+rX backup/ 
- Exit the admin control plane node: - exit 
- Run - scpto copy the backup folder out of the admin control plane node:- sudo scp -r -i vsphere_tmp ubuntu@EXTERNAL_IP:backup/ RELATIVE_DIRECTORY - Replace - RELATIVE_DIRECTORYwith a path where you want to store your backup.
Restoring an admin cluster
The following procedure recreates a backed-up admin cluster and all of the user control planes it managed when its etcd snapshot was created.
- Run - scpto copy- snapshot.dbto the admin control plane:- sudo scp -i vsphere_tmp snapshot.db ubuntu@[EXTERNAL_IP]: - where [EXTERNAL_IP] is the admin control plane's external IP address, which you gathered previously. 
- Shell into the admin control plane: - sudo ssh -i vsphere_tmp ubuntu@[EXTERNAL_IP] 
- Copy - snapshot.db/to- /mnt:- sudo cp snapshot.db /mnt/ 
- Make temporary directory, like - backup:- mkdir backup 
- Exit the admin control plane: - exit 
- Copy the certificates to - backup/:- sudo scp -r -i vsphere_tmp [BACKUP_CERT_FILE] ubuntu@[EXTERNAL_IP]:backup/ 
- Shell into the admin control plane node: - ssh -i vsphere_tmp ubuntu@[EXTERNAL_IP] - where [EXTERNAL_IP] is the admin control plane's external IP address, which you gathered previously. 
- Stop - kube-etcdand- kube-apiserver.- sudo mv /etc/kubernetes/manifests/etcd.yaml /tmp/etcd.yaml - sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/kube-apiserver.yaml 
- Copy the backup Secrets to - /etc/kubernetes/pki/:- sudo cp -r backup/* /etc/kubernetes/pki/ 
- Run - etcdctl restore:- ETCDCTL_API=3 sudo etcdctl snapshot restore /backup/snapshot.db sudo rm -r /var/lib/etcd/* sudo mv /default.etcd/member/ /var/lib/etcd/ 
- Restart - kube-etcdand- kube-apiserver.- sudo mv /tmp/etcd.yaml /etc/kubernetes/manifests/etcd.yaml - sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/kube-apiserver.yaml 
- Verify - kube-etcdand- kube-apiserverhave started.- sudo crictl ps -a 
- Copy - /etc/kubernetes/admin.confto a- .kubefolder so it can be accessed from admin workstation:- mkdir -p [HOME]/.kube - sudo cp -i /etc/kubernetes/admin.conf [HOME]/.kube/config - sudo chown $(id -u):$(id -g) $HOME/.kube/config 
- Exit the admin control plane: - exit 
- Copy the newly generated kubeconfig file out of the admin node: - sudo scp -i vsphere_tmp ubuntu@[EXTERNAL_IP]:[HOME]/.kube/config kubeconfig - sudo chown $(id -u):$(id -g) kubeconfig - where: - [EXTERNAL_IP] is the admin control plane's external IP address.
- [HOME] is the home directory on the admin node.
 - Now you can use this new kubeconfig file to access the restored cluster. 
Troubleshooting an admin cluster restore
If you encounter an issue when restoring the admin cluster, you must contact Google Support to resolve the issue with the admin cluster.
In the meantime, you can check the following to further troubleshoot.
- Find the etcd container id - sudo crictl ps -a | grep [ADMIN_ETCD_POD] - where [ADMIN_ETCD_POD] is the name of the etcd Pod. 
- Examine the logs from the etc container - sudo crictl logs [ETCD_CONTAINER_ID] - where [ETCD_CONTAINER_ID] is the id of the etcd container. 
- Look for the following permission denied log messages like - etcdserver: create snapshot directory error:mkdir /var/lib/etcd/member/snap: permission denied
- If permission denied messages are found update the ownership of /opt/data/var/lib/etcd/ - sudo chown -R 2001:2001 /opt/data/var/lib/etcd/ 
- Verify - kube-etcdand- kube-apiserverhave started.- sudo crictl ps 
Automatic cluster backup
You can use the script given here as an example on how to automatically back up your clusters. Note that the following script is not supported, and should only be used as reference to write a better, more robust and complete script. Before you run the script, fill in values for the five variables at the beginning of the script:
- Set BACKUP_DIRto the path where you want to store the admin and user cluster backups. This path should not exist.
- Set ADMIN_CLUSTER_KUBECONFIGto the path of the admin cluster's kubeconfig file
- Set USER_CLUSTER_NAMESPACEto the name of your user cluster. The name of your user cluster is a namespace in the admin cluster.
- Set EXTERNAL_IPto the VIP that you reserved for the admin control plane service.
- Set SSH_PRIVATE_KEYto the path of your SSH key.
- If you are using a private network, set JUMP_IPto your network's jump server's IP address.
#!/usr/bin/env bash
# Automates manual steps for taking backups of user and admin clusters.
# Fill in the variables below before running the script.
BACKUP_DIR=""                       # path to store user and admin cluster backups
ADMIN_CLUSTER_KUBECONFIG=""         # path to admin cluster kubeconfig
USER_CLUSTER_NAMESPACE=""           # user cluster namespace
EXTERNAL_IP=""                      # admin control plane node external ip - follow steps in documentation
SSH_PRIVATE_KEY=""                  # path to vsphere_tmp ssh private key - follow steps in documentation
JUMP_IP=""                          # network jump server IP - leave empty string if not using private network.
mkdir -p $BACKUP_DIR
mkdir $BACKUP_DIR/pki
# USER CLUSTER BACKUP
# Snapshot user cluster etcd
kubectl --kubeconfig=${ADMIN_CLUSTER_KUBECONFIG} exec -it -n ${USER_CLUSTER_NAMESPACE} kube-etcd-0 -c kube-etcd -- /bin/sh -ec "export ETCDCTL_API=3; etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etcd.local.config/certificates/etcdCA.crt --cert=/etcd.local.config/certificates/etcd.crt --key=/etcd.local.config/certificates/etcd.key snapshot save /tmp/${USER_CLUSTER_NAMESPACE}_snapshot.db"
kubectl --kubeconfig=${ADMIN_CLUSTER_KUBECONFIG} cp ${USER_CLUSTER_NAMESPACE}/kube-etcd-0:tmp/${USER_CLUSTER_NAMESPACE}_snapshot.db $BACKUP_DIR/user-cluster_${USER_CLUSTER_NAMESPACE}_snapshot.db 
# ADMIN CLUSTER BACKUP
# Set up ssh options
SSH_OPTS=(-oStrictHostKeyChecking=no -i ${SSH_PRIVATE_KEY})
if [ "${JUMP_IP}" != "" ]; then
    SSH_OPTS+=(-oProxyCommand="ssh -oStrictHostKeyChecking=no -i ${SSH_PRIVATE_KEY} -W %h:%p ubuntu@${JUMP_IP}")
fi
# Copy admin certs
ssh "${SSH_OPTS[@]}" ubuntu@${EXTERNAL_IP} 'sudo chmod -R a+rX /etc/kubernetes/pki/*'
scp -r "${SSH_OPTS[@]}" ubuntu@${EXTERNAL_IP}:/etc/kubernetes/pki/* ${BACKUP_DIR}/pki/
# Snapshot admin cluster etcd
admin_etcd=$(kubectl --kubeconfig=${ADMIN_CLUSTER_KUBECONFIG} get pods -n kube-system -l component=etcd,tier=control-plane -ojsonpath='{$.items[*].metadata.name}{"\n"}')
kubectl --kubeconfig=${ADMIN_CLUSTER_KUBECONFIG} exec -it -n kube-system ${admin_etcd} -- /bin/sh -ec "export ETCDCTL_API=3; etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save /tmp/admin_snapshot.db"
kubectl --kubeconfig=${ADMIN_CLUSTER_KUBECONFIG} cp -n kube-system ${admin_etcd}:tmp/admin_snapshot.db $BACKUP_DIR/admin-cluster_snapshot.db
What's next
- Back up and restore a user cluster
- Diagnose cluster issues
- Learn about augur, an open-source tool for restoring individual objects from etcd backups.