Backing up user clusters on AWS

This page shows how to back up the etcd data store for your GKE on AWS installation for recovery from events that may damage your cluster's etcd data.

Limitations

  • Using a backup file to restore your etcd data is a last resort. We do not recommend restoring from a backup file unless the cluster is completely broken. Contact Google support for help in deciding the best course of action.

  • This procedure does not back up data from your workloads, including PersistentVolumes.

  • This backup cannot be used to restore a cluster from a different version of GKE on AWS.

Backing up a user cluster

A user cluster backup is a snapshot of the user cluster's etcd store. The etcd store contains all of the Kubernetes objects and custom objects that represent the cluster's state. The snapshot contains the data required to recreate the cluster's stateless workloads.

To create a snapshot of the etcd data store, perform the following steps:

  1. Open a shell on the management service instance running etcd for your cluster.

    1. Find the IP address of your cluster's management service instance.

      export CLUSTER_ID=$(terraform output cluster_id)
      export MANAGEMENT_IP=$(aws ec2 describe-instances \
        --filters "Name=tag:Name,Values=$CLUSTER_ID-management-0" \
        --query "Reservations[*].Instances[*].PrivateIpAddress" \
        --output text)
      
    2. Use the ssh tool to open a connection to the management service instance.

      Direct connection

      ssh -i ~/.ssh/anthos-gke ubuntu@$MANAGEMENT_IP
      

      Bastion host

      export BASTION_DNS=$(terraform output bastion_dns_name)
      ssh -i ~/.ssh/anthos-gke -J ubuntu@$BASTION_DNS ubuntu@$MANAGEMENT_IP
      
  2. Create a directory to store the etcd backup data.

    mkdir ./etcd-backups
    
  3. Use the ps command-line tool to find the process ID of the etcd process on that instance.

    ps -e | grep etcd
    

    The output shows details of your etcd process. The first element is etcd's process ID. In the following steps, replace ETCD_PID with this process ID.

  4. Create a script within the etcd container's filesystem to take a snapshot. This script runs etcdctl to connect to the etcd daemon and perform a snapshot to back up the etcd database.

    cat << EOT > /tmp/etcdbackup.sh
    # Extract a snapshot of the anthos-gke etcd state database
    
    export ETCDCTL_API=3
    
    etcdctl \
     --endpoints=https://127.0.0.1:2379 \
     --cacert=/secrets/server-ca.crt \
     --cert=/secrets/server.crt \
     --key=/secrets/server.key \
     snapshot save /tmp/snapshot.db
    EOT
    
    chmod a+x /tmp/etcdbackup.sh
    sudo mv /tmp/etcdbackup.sh /proc/ETCD_PID/root/tmp/etcdbackup.sh
    
  5. Use the nsenter command to run the script within the etcd container to create the snapshot.

    sudo nsenter --all --target ETCD_PID /tmp/etcdbackup.sh
    
  6. Copy the snapshot file out of the etcd container.

    sudo cp /proc/ETCD_PID/root/tmp/snapshot.db ./etcd-backups
    
  7. Copy all files in the /secrets directory of the etcd container to your backup directory. These files contain the certificates that encrypt and validate communication between etcd and other processes in the cluster. Together, the snapshot file and the certificates files are a full backup of your etcd cluster status.

    sudo cp -r /proc/ETCD_PID/root/secrets ./etcd-backups
    
  8. Use the tar tool to bundle the etc-backup files into a convenient tar file.

    tar -cvf etcd-backup.tar etcd-backup
    
  9. Exit to your local machine and use the scp tool to copy the etcd-backup.tar file from the management service instance. This example uses the BASTION_DNS and MANAGEMENT_IP environment variables defined earlier.

    scp -i ~/.ssh/anthos-gke -J ubuntu@$BASTION_DNS \
     ubuntu@$MANAGEMENT_IP:~/etcd-backup/backup.tar
    

For More Information