This guide describes how you can back up the data in your (GKE) connected Parallelstore instance to a Cloud Storage bucket and prevent potential data loss by configuring a GKE CronJob to automatically back up the data on a schedule. This guide also describes how you can recover data for a Parallelstore instance.
Before you begin
Follow Create and connect to a Parallelstore instance from GKE to set up your GKE cluster and Parallelstore instance.
Data backup
The following section describes how you can set up a GKE CronJob to continually back up your data from a Parallelstore instance in the GKE cluster to prevent data loss.
Connect to your GKE cluster
Get the credentials for your GKE cluster:
gcloud container clusters get-credentials CLUSTER_NAME \
--project=PROJECT_ID \
--location=CLUSTER_LOCATION
Replace the following:
- CLUSTER_NAME: the GKE cluster name.
- PROJECT_ID: the Google Cloud project ID.
- CLUSTER_LOCATION: the Compute Engine zone containing the cluster. Your cluster must be in a supported zone for the Parallelstore CSI driver.
Provision required permissions
Your GKE CronJob needs the roles/parallelstore.admin
and roles/storage.admin
roles to import and export data between Cloud Storage and Parallelstore.
Create a Google Cloud service account
gcloud iam service-accounts create parallelstore-sa \
--project=PROJECT_ID
Grant the Google Cloud service account roles
Grant Parallelstore Admin and Cloud Storage Admin roles to the service account.
gcloud projects add-iam-policy-binding PROJECT_ID \
--member=serviceAccount:parallelstore-sa@PROJECT_ID.iam.gserviceaccount.com \
--role=roles/parallelstore.admin
gcloud projects add-iam-policy-binding PROJECT_ID \
--member serviceAccount:parallelstore-sa@PROJECT_ID.iam.gserviceaccount.com \
--role=roles/storage.admin
Set up a GKE service account
You need to set up a GKE service account and allow it to impersonate the Google Cloud service account. Use the following steps to allow GKE service account to bind to the Google Cloud service account.
Create the following
parallelstore-sa.yaml
service account manifest:# GKE service account used by workload and will have access to Parallelstore and GCS apiVersion: v1 kind: ServiceAccount metadata: name: parallelstore-sa namespace: default
Next, deploy it to your GKE cluster using this command:
kubectl apply -f parallelstore-sa.yaml
Allow the GKE service account to impersonate the Google Cloud service account.
# Bind the GCP SA and GKE SA gcloud iam service-accounts add-iam-policy-binding parallelstore-sa@PROJECT_ID.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[default/parallelstore-sa]" # Annotate the GKE SA with GCP SA kubectl annotate serviceaccount parallelstore-sa \ --namespace default \ iam.gke.io/gcp-service-account=parallelstore-sa@PROJECT_ID.iam.gserviceaccount.com
Grant permissions to the Parallelstore Agent service account
gcloud storage buckets add-iam-policy-binding GCS_BUCKET \
--member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-parallelstore.iam.gserviceaccount.com \
--role=roles/storage.admin
Replace the following:
- GCS_BUCKET: The Cloud Storage bucket URI in the
format of
gs://<bucket_name>
. - PROJECT_NUMBER: The Google Cloud project number.
Start the CronJob
Configure and start a GKE CronJob for periodically exporting data from Parallelstore to Cloud Storage.
Create the configuration file ps-to-gcs-backup.yaml
for the CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: ps-to-gcs-backup
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
schedule: "0 * * * *"
successfulJobsHistoryLimit: 3
suspend: false
jobTemplate:
spec:
template:
metadata:
annotations:
gke-parallelstore/cpu-limit: "0"
gke-parallelstore/ephemeral-storage-limit: "0"
gke-parallelstore/memory-limit: "0"
gke-parallelstore/volumes: "true"
spec:
serviceAccountName: parallelstore-sa
containers:
- name: pstore-backup
image: google/cloud-sdk:slim
imagePullPolicy: IfNotPresent
command:
- /bin/bash
- -c
- |
#!/bin/bash
set -ex
# Retrieve modification timestamp for the latest file up to the minute
latest_folder_timestamp=$(find $PSTORE_MOUNT_PATH/$SOURCE_PARALLELSTORE_PATH -type d -printf '%T@ %p\n'| sort -n | tail -1 | cut -d' ' -f2- | xargs -I{} stat -c %x {} | xargs -I {} date -d {} +"%Y-%m-%d %H:%M")
# Start exporting from PStore to GCS
operation=$(gcloud beta parallelstore instances export-data $PSTORE_NAME \
--location=$PSTORE_LOCATION \
--source-parallelstore-path=$SOURCE_PARALLELSTORE_PATH \
--destination-gcs-bucket-uri=$DESTINATION_GCS_URI \
--async \
--format="value(name)")
# Wait until operation complete
while true; do
status=$(gcloud beta parallelstore operations describe $operation \
--location=$PSTORE_LOCATION \
--format="value(done)")
if [ "$status" == "True" ]; then
break
fi
sleep 60
done
# Check if export succeeded
error=$(gcloud beta parallelstore operations describe $operation \
--location=$PSTORE_LOCATION \
--format="value(error)")
if [ "$error" != "" ]; then
echo "!!! ERROR while exporting data !!!"
fi
# Delete the old files from PStore if requested
# This will not delete the folder with the latest modification timestamp
if $DELETE_AFTER_BACKUP && [ "$error" == "" ]; then
find $PSTORE_MOUNT_PATH/$SOURCE_PARALLELSTORE_PATH -type d -mindepth 1 |
while read dir; do
# Only delete folders that is modified earlier than the latest modification timestamp
folder_timestamp=$(stat -c %y $dir)
if [ $(date -d "$folder_timestamp" +%s) -lt $(date -d "$latest_folder_timestamp" +%s) ]; then
echo "Deleting $dir"
rm -rf "$dir"
fi
done
fi
env:
- name: PSTORE_MOUNT_PATH # mount path of the Parallelstore instance, should match the volumeMount defined for this container
value: "PSTORE_MOUNT_PATH"
- name: PSTORE_NAME # name of the Parallelstore instance that need backup
value: "PSTORE_NAME"
- name: PSTORE_LOCATION # location/zone of the Parallelstore instance that need backup
value: "PSTORE_LOCATION"
- name: SOURCE_PARALLELSTORE_PATH # absolute path from the PStore instance, without volume mount path
value: "SOURCE_PARALLELSTORE_PATH"
- name: DESTINATION_GCS_URI # GCS bucket uri used for storing backups, starting with "gs://"
value: "DESTINATION_GCS_URI"
- name: DELETE_AFTER_BACKUP # will delete old data from Parallelstore if true
value: "DELETE_AFTER_BACKUP"
volumeMounts:
- mountPath: PSTORE_MOUNT_PATH # should match the value of env var PSTORE_MOUNT_PATH
name: PSTORE_PV_NAME
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
terminationGracePeriodSeconds: 30
volumes:
- name: PSTORE_PV_NAME
persistentVolumeClaim:
claimName: PSTORE_PVC_NAME
Replace the following variables:
- PSTORE_MOUNT_PATH: The mount path of the Parallelstore
instance, it should match the
volumeMount
defined for this container. - PSTORE_PV_NAME: The name of the GKE
PersistentVolume
that points to your Parallelstore instance. This should have been set up in your GKE cluster as part of the prerequisites. - PSTORE_PVC_NAME: The name of the GKE
PersistentVolumeClaim
that requests the usage of the ParallelstorePersistentVolume
. This should have been set up in your GKE cluster as part of the prerequisites. - PSTORE_NAME: The name of the Parallelstore instance that need backup.
- PSTORE_LOCATION: The location of the Parallelstore instance that need backup.
- SOURCE_PARALLELSTORE_PATH: The absolute path from the
Parallelstore instance without the volume mount path and must start with
/
. - DESTINATION_GCS_URI: The Cloud Storage bucket
URI to a Cloud Storage bucket, or a path within a bucket, using the format of
gs://<bucket_name>/<optional_path_inside_bucket>
. - DELETE_AFTER_BACKUP: The configuration to decide whether
to delete old data from Parallelstore after backup and free up space, supported
values:
true
orfalse
.
Deploy the CronJob to your GKE cluster using the following command:
kubectl apply -f ps-to-gcs-backup.yaml
See CronJob for more details about setting up a CronJob.
Detecting data loss
When the state of a Parallelstore instance is FAILED
, the data on the instance may
no longer be accessible. You can use the following Google Cloud CLI command to check
the state of the Parallelstore instance:
gcloud beta parallelstore instances describe PARALLELSTORE_NAME \
--location=PARALLELSTORE_LOCATION \
--format="value(state)"
Data recovery
When a disaster happens or the Parallelstore instance fails for any reason, you can either use the GKE VolumePopulator to automatically preload data from Cloud Storage into a GKE managed Parallelstore instance, or manually create a new Parallelstore instance and import data from a Cloud Storage backup.
If you are recovering from a checkpoint of your workload, you need to decide which checkpoint to recover from by providing the path inside the Cloud Storage bucket.
GKE Volume Populator
GKE Volume Populator can be used to preload data from a Cloud Storage bucket path into a newly created Parallelstore instance. Instructions for this can be found in Preload Parallelstore.
Manual recovery
You can also create a Parallelstore instance manually and import data from a Cloud Storage bucket with the following steps.
Create a new Parallelstore instance:
gcloud beta parallelstore instances create PARALLELSTORE_NAME \ --capacity-gib=CAPACITY_GIB \ --location=PARALLELSTORE_LOCATION \ --network=NETWORK_NAME \ --project=PROJECT_ID
Import data from Cloud Storage:
gcloud beta parallelstore instances import-data PARALLELSTORE_NAME \ --location=PARALLELSTORE_LOCATION \ --source-gcs-bucket-uri=SOURCE_GCS_URI \ --destination-parallelstore-path=DESTINATION_PARALLELSTORE_PATH \ --async
Replace the following:
- PARALLELSTORE_NAME: The name of this Parallelstore instance.
- CAPACITY_GIB: The storage capacity of the Parallelstore
instance in GB, value from
12000
to100000
, in multiples of4000
. - PARALLELSTORE_LOCATION: The location of the Parallelstore instance that need backup, it must be in the supported zone.
- NETWORK_NAME: The name of the VPC network
that you created during Configure a VPC network, it must be the
same network your GKE cluster uses and have
Private Services Access
enabled. - SOURCE_GCS_URI: The Cloud Storage bucket
URI to a Cloud Storage bucket, or a path within a bucket where you have the
data you want to import from, using the format of
gs://<bucket_name>/<optional_path_inside_bucket>
. - DESTINATION_PARALLELSTORE_PATH: The absolute path from the
Parallelstore instance where you want to import the data to, must start with
/
.
More details about importing data into a Parallelstore instance can be found in Transfer data to or from Cloud Storage.