Scheduling backups in Cloud Storage

This page describes how to schedule backups for Cassandra in Cloud Storage. In this method, backups are stored in the specified Cloud Storage bucket.

To schedule Cassandra backups, perform the following steps:

  1. Run the following create-service-account command to create a Google Cloud service account (SA) with the standard roles/storage.objectAdmin role. This SA role allows you to write backup data to Cloud Storage. E Execute the following command in the directory appropriate for your management tool:
    • Helm charts: $APIGEE_HELM_CHARTS_HOME/apigee-operator/etc/
    • apigeectl: HYBRID_BASE_DIRECTORY/hybrid-files/
    ./tools/create-service-account --env non-prod --dir ./service-accounts
    This command creates a single service account named apigee-non-prod for use in non-production environments and places the downloaded key file in the ./service-accounts directory. For more information about Google Cloud service accounts, see Creating and managing service accounts.
  2. The create-service-account command saves a JSON file containing the service account private key. The file is saved in the same directory where the command executes. You will need the path to this file in the following steps.
  3. Create a Cloud Storage bucket. Specify a reasonable data retention policy for the bucket. Apigee recommends a data retention policy of 15 days.
  4. Open your overrides.yaml file.
  5. Add the following cassandra.backup properties to enable backup. Do not remove any of the properties that are already configured.

    Parameters

    cassandra:
        ...
    
        backup:
          enabled: true
          serviceAccountPath: SA_JSON_FILE_PATH
          dbStorageBucket: CLOUD_STORAGE_BUCKET_PATH
          schedule: BACKUP_SCHEDULE_CODE
          cloudProvider: "GCP" # required verbatim "GCP" (all caps)
    
        ...
        

    Example

      ...
    
      cassandra:
        storage:
          type: gcepd
          capacity: 50Gi
          gcepd:
            replicationType: regional-pd
        auth:
          default:
            password: "abc123"
          admin:
            password: "abc234"
          ddl:
            password: "abc345"
          dml:
            password: "abc456"
        nodeSelector:
          key: cloud.google.com/gke-nodepool
          value: apigee-data
        backup:
          enabled: true
          serviceAccountPath: "my-cassandra-backup-sa.json"
          dbStorageBucket: "gs://myname-cassandra-backup"
          schedule: "45 23 * * 6"
          cloudProvider: "GCP"
          
    
    
        ... 
  6. Where:
    Property Description
    backup:enabled Backup is disabled by default. You must set this property to true.
    backup:serviceAccountPath

    SA_JSON_FILE_PATH

    The path on your filesystem to the service account JSON file that was downloaded when you ran the create-service-account command.

    For installations managed with Helm, the path must be relative to the apigee-datastore chart directory. For example,
    serviceAccountPath: myproject-apigee-cassandra.json.

    For installations managed with apigeectl, you can also provide a relative file path. The path will be relative to the hybrid-base-directory/hybrid-files directory.

    backup:dbStorageBucket

    CLOUD_STORAGE_BUCKET_PATH

    The Cloud Storage bucket path in this format: gs://BUCKET_NAME. The gs:// is required.

    backup:cloudProvider

    GCP

    For a backup to Cloud Storage, set the property to GCP.

    backup:schedule

    BACKUP_SCHEDULE_CODE

    The time when the backup starts, specified in standard crontab syntax. Default: 0 2 * * *

  7. Apply the configuration changes to the new cluster. For example:

    Helm

    helm upgrade datastore apigee-datastore/ \
      --namespace APIGEE_NAMESPACE \
      --atomic \
      -f OVERRIDES_FILE.yaml
    

    apigeectl

    $APIGEECTL_HOME/apigeectl apply -f OVERRIDES_FILE.yaml --datastore

    Where OVERRIDES_FILE is the path to the overrides file you just edited.

  8. Verify the backup job. For example:
    kubectl get cronjob -n APIGEE_NAMESPACE
      NAME                      SCHEDULE     SUSPEND   ACTIVE   LAST SCHEDULE   AGE
      apigee-cassandra-backup   33 * * * *   False     0        <none>          94s

Launch a manual backup

Backup jobs are triggered automatically according to the cron schedule set in cassandra.backup.schedule in your overrides.yaml file. However, you can also initiate a backup job manually if needed using the following command:

kubectl create job -n APIGEE_NAMESPACE --from=cronjob/apigee-cassandra-backup MANUAL_BACKUP_JOB_NAME

Where MANUAL_BACKUP_JOB_NAME is the name of a manual backup job to be be created.