Cassandra backup overview

Apigee hybrid backup and restore feature lets you create backups of the hybrid data on demand, and in case of disaster scenarios, restore the data to previous working snapshots. Backup availability and retention is based on the backup infrastructure provided by you.

A typical installation of Apigee hybrid consists of the following components:

  • MART (admin service)
  • Controller and Watcher (manage Kubernetes objects)
  • Istio (manages Ingress)
  • Runtime, Sync, and UDCA (one per environment)
  • Telemetry (monitoring and logging)
  • Cert manager (manages certificates)
  • Datastores (Cassandra and Redis databases)

All the components except for Cassandra are stateless and hence they don't persist any data. Backup and restoration is not necessary for those components using the existing overrides is sufficient.

Why take backups of Cassandra?

Backups are an important measure of protection against disaster scenarios. Each backup is a consistent snapshot of the current Cassandra data that existed at the time the backup was created. This includes the Cassandra data along with schema / metadata within the Cassandra cluster. In the event of a disaster, backups lets you restore your Apigee hybrid instance to a previously known good state. Depending on the hybrid instance size, there could be one or several backup files for a single backup set.

What you need to know about Cassandra backups?

Cassandra is a replicated database that is configured to have at least three copies of your data in each region or data center. Cassandra uses streaming replication and read repairs to maintain the data replicas in each region or data center at any given point.

In hybrid, Cassandra backups are not enabled by default.It's a good practice, however, to enable Cassandra backups in case your data is lost due to a catastrophic failure. Cassandra backups are intended for use in cases of disaster recovery and not for restoring data loss caused by accidental deletion.

Backups are created according to the schedule set in your overrides.yaml file. Once a backup schedule has been applied to your Hybrid cluster, a Kubernetes backup job is periodically executed according to the schedule. The job triggers a backup script on each Cassandra node in your Hybrid cluster that collects all the data on the node, creates an archive file of the data, and sends the archive to the Cloud Storage bucket specified in the Cassandra backup configuration in your overrides.yaml file.

What is backed up?

The Apigee hybrid scheduled backup is a complete backup of the persisted runtime data stored in Apigee's Cassandra at the time of backup. Any data modifications after the backup time will not be available in the backup. The scheduled backup consist of the following entities:

  • Cassandra schema, including the user schema (Apigee keyspace definitions).
  • Cassandra partition token information per Cassandra node in a cluster.
  • A snapshot of the Cassandra data.

Where is backup data stored?

The location of the backup data depends on your backup method. Apigee hybrid supports the following methods for taking backups:

  • Backup in Cloud Storage: Backup is stored in the configured Cloud Storage buckets in your Google Cloud Project.
  • Backup in a remote server: Backup is stored in a directory on a remote server specified by you.

How the data is secured?

If you are using Cloud Storage for backup, the backup data is encrypted by default. In case of backups not on Cloud Storage, backup data is encrypted during the transfer to the remote server. But after the transfer, you must ensure that the backup data is encrypted in the remote server.

How to take backups?

You must schedule the backups as cron jobs. The cronjob reads the configuration from an overrides.yaml file configured by you. Apigee recommends you to make a copy of the overrides.yaml file, so that you can reuse it during the recovery process.

The following sections describe in detail how to schedule backups in Cloud Storage and in a remote server.