Cassandra restore overview

This page provides an overview of restoring Cassandra in Apigee hybrid.

Why use restore?

You can use backups to restore Apigee infrastructure from the ground up in the event of catastrophic failures, such as irrecoverable data loss in your Apigee hybrid instance from a disaster. Restoration takes your data from the backup location and restores the data into a new Cassandra cluster with the same number of nodes. No cluster data is taken from the old Cassandra cluster. The goal of the restoration process is to bring an Apigee hybrid installation back to a previously operational state using backup data from a snapshot.

The use of backups to restore is not recommended for the following scenarios:

Cassandra node failures.
Accidental deletion of data like apps, developers, and api_credentials.
One or more regions going down in a multi-region hybrid deployment.

Apigee Cassandra deployments and operational architecture take care of redundancy and fault tolerance for a single region. In most cases, the recommended multi-region production implementation of hybrid means that a region failure can be recovered from another live region using region decommissioning and expansion procedures instead of restoring from a backup.

Before you begin implementing a restore from a Cassandra backup, be aware of the following:

Downtime: There will be downtime for the duration of the restoration.
Data loss: There will be data loss between the last valid backup and the time the restoration is complete.
Restoration time: Restoration time depends on the size of the data and cluster.
Cherry-picking data: You cannot select only specific data to restore. Restoration restores the entire backup you select.

Multi-region restores

If you installed Apigee hybrid into multiple regions, you must check the overrides file for the region you are restoring to make sure the cassandra:hostNetwork is set to false before you perform the restoration. For more information, see Restoring in multiple regions.

Prerequisites

Check all the following prerequisites are successful. Investigate any prerequisite failures before proceeding with restoration.

Verify all Cassandra pods are up and running with the following command.

kubectl get pods -n apigee -l app=apigee-cassandra

Your output should look something like the following example:

NAME                         READY   STATUS    RESTARTS   AGE
apigee-cassandra-default-0   1/1     Running   0          14m
apigee-cassandra-default-1   1/1     Running   0          13m
apigee-cassandra-default-2   1/1     Running   0          11m
exampleuser@example hybrid-files %

Verify the Cassandra statefulset shows all pods are running with the following command.
```
kubectl get sts -n apigee -l app=apigee-cassandra
```
Your output should look something like the following example:
```
NAME                       READY   AGE
apigee-cassandra-default   3/3     15m
    
```
Verify the ApigeeDatastore resource is in a running state with the following command.
```
kubectl get apigeeds -n apigee
```
Your output should look something like the following example:
```
NAME      STATE     AGE
default   running   16m
    
```

Verify all Cassandra PVCs are in Bound status with the following command.

kubectl get pvc -n apigee -l app=apigee-cassandra

Your output should look something like the following example:

NAME                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-apigee-cassandra-default-0   Bound    pvc-a14184e7-8745-4b30-8069-9d50642efe04   10Gi       RWO            standard-rwo   17m
cassandra-data-apigee-cassandra-default-1   Bound    pvc-ed129dcb-4706-4bad-a692-ac7c78bad64d   10Gi       RWO            standard-rwo   15m
cassandra-data-apigee-cassandra-default-2   Bound    pvc-faed0ad1-9019-4def-adcd-05e7e8bb8279   10Gi       RWO            standard-rwo   13m

Verify all Cassandra PVs are in Bound status with the following command.

kubectl get pv -n apigee

Your output should look something like the following example:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                              STORAGECLASS   REASON   AGE
pvc-a14184e7-8745-4b30-8069-9d50642efe04   10Gi       RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-0   standard-rwo            17m
pvc-ed129dcb-4706-4bad-a692-ac7c78bad64d   10Gi       RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-1   standard-rwo            16m
pvc-faed0ad1-9019-4def-adcd-05e7e8bb8279   10Gi       RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-2   standard-rwo            14m

Verify the Apigee Controller resource is in Running status with the following command.

kubectl get pods -n apigee-system -l app=apigee-controller

Your output should look something like the following example:

NAME                                         READY   STATUS    RESTARTS   AGE
apigee-controller-manager-856d9bb7cb-cfvd7   2/2     Running   0          20m

How to restore?

Cassandra's restoration steps differ slightly depending on whether your Apigee hybrid is deployed in a single region or multiple regions. For the detailed restoration steps, see the following documentation: