Restoring in multiple regions

This page describes how to recover or restore Cassandra in multiple regions.

In a multi-region deployment, Apigee hybrid is deployed in multiple geographic locations across different datacenters. If one or more regions fail, but healthy regions remain, you can use a healthy region to recover failed Cassandra regions with the latest data.

In the event of a catastrophic failure of all hybrid regions, Cassandra can be restored. It is important to note that, if you have multiple Apigee organizations in your deployment, the restore process restores data for all the organizations. In a multi-organization setup, restoring only a specific organization is not supported.

This topic describes both approaches to salvaging failed region(s):

  • Recover failed region(s) - Describes the steps to recover failed region(s) based on a healthy region.
  • Restore failed region(s) - Describes the steps to restore failed region(s) from a backup. This approach is only required if all hybrid regions are impacted.

Recover failed region(s)

To recover failed region(s) from a healthy region, perform the following steps:

  1. Redirect the API traffic from the impacted region(s) to the good working region. Plan the capacity accordingly to support the diverted traffic from failed region(s).
  2. Decommission the impacted region. For each impacted region, follow the steps outlined in Decommission a hybrid region. Wait for decommissioning to complete before moving on to the next step.

  3. Restore the impacted region. To restore, create a new region, as described in Multi-region deployment on GKE, GKE on-prem, and AKS.

Restoring from a backup

The Cassandra backup can either reside on Cloud Storage or on a remote server based on your configuration. To restore Cassandra from a backup, perform the following steps:

  1. Delete apigee hybrid deployment from all the regions:
    apigeectl delete -f overrides.yaml
  2. Restore the desired region from a backup. For more information, see Restoring a region from a backup.

  3. Remove the deleted region(s) references and add the restored region(s) references in the KeySpaces metadata.
  4. Get the region name by using the nodetool status option.
    kubectl exec -n apigee -it apigee-cassandra-default-0 -- bash
          nodetool  -u APIGEE_JMX_USER -pw APIGEE_JMX_PASSWORD status |grep -i Datacenter

    where:

    • APIGEE_JMX_USER is the username for the Cassandra JMX operations user. Used to authenticate and communicate with the Cassandra JMX interface. See cassandra:auth:jmx:username.
    • APIGEE_JMX_PASSWORD is the password for the Cassandra JMX operations user. See cassandra:auth:jmx:password.
  5. Update the KeySpaces replication.
    1. Create a client container and connect to the Cassandra cluster through the CQL interface.
    2. Get the list of user keyspaces from CQL interface:
      cqlsh CASSANDRA_SEED_HOST -u APIGEE_DDL_USER -p APIGEE_DDL_PASSWORD
            --ssl -e "select keyspace_name from system_schema.keyspaces;"|grep -v system

      where:

      • CASSANDRA_SEED_HOST is the Cassandra multi-region seed host. For most multi-region installations, use the IP address of a host in your first region. See Configure Apigee hybrid for multi-region and cassandra:externalSeedHost.
      • APIGEE_DDL_USER and APIGEE_DDL_PASSWORD are the admin username and password for the Cassandra Data Definition Language (DDL) user. The default values are "ddl_user" and "iloveapis123".

        See cassandra.auth.ddl.password in the Configuration properties reference and Command Line Options in the Apache Cassandra cqlsh documentation.

    3. For each keyspace, run the following command from the CQL interface to update the replication settings:
      ALTER KEYSPACE KEYSPACE_NAME WITH replication = {'class': 'NetworkTopologyStrategy', 'REGION_NAME':3};

      where:

      • KEYSPACE_NAME is the name of the keyspace listed in the previous step's output.
      • REGION_NAME is the region name obtained in Step 4.