This page provides an overview of restoring Cassandra in Apigee hybrid.
Why use restore?
You can use backups to restore Apigee infrastructure from the ground up in the event of catastrophic failures, such as irrecoverable data loss in your Apigee hybrid instance from a disaster. Restoration takes your data from the backup location and restores the data into a new Cassandra cluster with the same number of nodes. No cluster data is taken from the old Cassandra cluster. The goal of the restoration process is to bring an Apigee hybrid installation back to a previously operational state using backup data from a snapshot.
The use of backups to restore is not recommended for the following scenarios:
- Cassandra node failures.
- Accidental deletion of data like
apps
,developers
, andapi_credentials
. - One or more regions going down in a multi-region hybrid deployment.
Apigee Cassandra deployments and operational architecture take care of redundancy and fault tolerance for a single region. In most cases, the recommended multi-region production implementation of hybrid means that a region failure can be recovered from another live region using region decommissioning and expansion procedures instead of restoring from a backup.
Before you begin implementing a restore from a Cassandra backup, be aware of the following:
- Downtime: There will be downtime for the duration of the restoration.
- Data loss: There will be data loss between the last valid backup and the time the restoration is complete.
- Restoration time: Restoration time depends on the size of the data and cluster.
- Cherry-picking data: You cannot select only specific data to restore. Restoration restores the entire backup you select.
Multi-region restores
If you installed Apigee hybrid into multiple regions, you must check the overrides file
for the region you are restoring to make sure the cassandra:hostNetwork
is set
to false
before you perform the restoration. For more information, see
Restoring in multiple regions.
Prerequisites
Check all the following prerequisites are successful. Investigate any prerequisite failures before proceeding with restoration.
- Verify all Cassandra pods are up and running with the following command.
kubectl get pods -n apigee -l app=apigee-cassandra
Your output should look something like the following example:
NAME READY STATUS RESTARTS AGE apigee-cassandra-default-0 1/1 Running 0 14m apigee-cassandra-default-1 1/1 Running 0 13m apigee-cassandra-default-2 1/1 Running 0 11m exampleuser@example hybrid-files %
- Verify the Cassandra statefulset shows all pods are running with the following command.
kubectl get sts -n apigee -l app=apigee-cassandra
Your output should look something like the following example:
NAME READY AGE apigee-cassandra-default 3/3 15m
- Verify the ApigeeDatastore resource is in a running state with the following command.
kubectl get apigeeds -n apigee
Your output should look something like the following example:
NAME STATE AGE default running 16m
- Verify all Cassandra PVCs are in Bound status with the following command.
kubectl get pvc -n apigee -l app=apigee-cassandra
Your output should look something like the following example:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-apigee-cassandra-default-0 Bound pvc-a14184e7-8745-4b30-8069-9d50642efe04 10Gi RWO standard-rwo 17m cassandra-data-apigee-cassandra-default-1 Bound pvc-ed129dcb-4706-4bad-a692-ac7c78bad64d 10Gi RWO standard-rwo 15m cassandra-data-apigee-cassandra-default-2 Bound pvc-faed0ad1-9019-4def-adcd-05e7e8bb8279 10Gi RWO standard-rwo 13m
- Verify all Cassandra PVs are in Bound status with the following command.
kubectl get pv -n apigee
Your output should look something like the following example:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a14184e7-8745-4b30-8069-9d50642efe04 10Gi RWO Delete Bound apigee/cassandra-data-apigee-cassandra-default-0 standard-rwo 17m pvc-ed129dcb-4706-4bad-a692-ac7c78bad64d 10Gi RWO Delete Bound apigee/cassandra-data-apigee-cassandra-default-1 standard-rwo 16m pvc-faed0ad1-9019-4def-adcd-05e7e8bb8279 10Gi RWO Delete Bound apigee/cassandra-data-apigee-cassandra-default-2 standard-rwo 14m
- Verify the Apigee Controller resource is in Running status with the following command.
kubectl get pods -n apigee-system -l app=apigee-controller
Your output should look something like the following example:
NAME READY STATUS RESTARTS AGE apigee-controller-manager-856d9bb7cb-cfvd7 2/2 Running 0 20m
How to restore?
Cassandra's restoration steps differ slightly depending on whether your Apigee hybrid is deployed in a single region or multiple regions. For the detailed restoration steps, see the following documentation: