Configure Cassandra for production

This topic describes steps you must take to configure the Cassandra database component for an Apigee hybrid production installation.

Ensure high availability

Cassandra clusters need three availability zones to maintain availability in a production environment. If one zone goes down, the remaining zones will continue responding to requests while the remaining zone comes back online. If two or more zones go down, Cassandra will be unable to respond to requests until at least two zones are online. Apigee recommends bringing zones back online within three hours to minimize the risk of missing data updates.

Configure Cassandra storage settings

For a production installation of Apigee hybrid, Google recommends that you add the following storage and heap settings to your overrides file and apply them to the cluster:

cassandra:
  ...
  replicaCount: 3
  storage:
    storageclass: your-preferred-ssd-storage #If not using default storage for your cluster
    capacity: 500Gi
  resources:
    requests:
      cpu: 7
      memory: 15Gi
  maxHeapSize: 8192M
  heapNewSize: 1200M

Apply changes to cassandra with the following command:

helm upgrade datastore apigee-datastore/ \
--namespace apigee \
--atomic \
-f OVERRIDES_FILE.yaml

replicaCount

The value of replicaCount must be a multiple of 3. To determine your desired replicaCount value, consider the following:

  • Estimate the traffic demands for your proxies.
  • Load test and make reasonable predictions of your CPU utilization.
  • You can specify different replicaCount values in different regions.
  • You can expand the replicaCount in the future in your overrides file.

storageclass

For production, Cassandra storage must be an SSD StorageClass. Set the value of storageclass if you are not using the default Kubernetes StorageClass for your cluster. You can check the default StorageClass with the following command.

kubectl get storageclass

Your output should look something like:

NAME                     PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
premium-rwo              pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   6d23h
standard                 kubernetes.io/gce-pd    Delete          Immediate              true                   6d23h
standard-rwo (default)   pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   6d23h

Follow the instructions in StorageClass configuration if you want to change the default Kubernetes StorageClass.

To check the current storageclass setting, execute the following command on your cluster:

kubectl get pvc -n NAMESPACE cassandra-data-apigee-cassandra-default-0 -o=jsonpath="{['.spec.storageClassName', '.metadata.annotations.volume\.beta\.kubernetes\.io/storage-class']}"
  

storageSize

For production installations, Google recommends a storage size of at least 500Gi (gibibytes). You can change the storage size in response to your cluster's storage needs. See the instructions in Expand Cassandra persistent volumes to change the storage capacity.

To check the current size setting, execute the following command on your cluster:

kubectl get pvc -n NAMESPACE cassandra-data-apigee-cassandra-default-0 -o=jsonpath='{.spec.resources.requests.storage}'
  

cpu and memory

For production installations, Google recommends at least 7 CPUs and a minimum 15Gi (gibibytes) per pod. When specifying cassandra.resources.requests.cpu and cassandra.resources.requests.memory, consider the traffic volume and the CPU and Memory demands of your proxies.

To check the current cpu setting, execute the following command on your cluster:

kubectl get pods -n NAMESPACE apigee-cassandra-default-0 -o=jsonpath='{.spec.containers[].resources.requests.cpu}'
  

To check the current memory setting, execute the following command on your cluster:

kubectl get pods -n NAMESPACE apigee-cassandra-default-0 -o=jsonpath='{.spec.containers[].resources.requests.memory}'
  

maxHeapSize and heapNewSize

These properties determine the maximum memory heap allocated to cassandra processes and the amount by which memory is increased, respectively, in megabytes (heap sizes are specified in megabytes, not mebibytes). For production environments, Google recommends the following values:

  • maxHeapSize: 8192M
  • heapNewSize: 1200M

Consult your Kubernetes platform provider's documentation for optimal heap size values.

To check the current maxHeapSize setting, execute the following command on your cluster:

kubectl get sts -n NAMESPACE apigee-cassandra-default -o=jsonpath='{.spec.template.spec.containers[].env[?(@.name=="MAX_HEAP_SIZE")]}'
  

To check the current heapNewSize setting, execute the following command on your cluster:

kubectl get sts -n NAMESPACE apigee-cassandra-default -o=jsonpath='{.spec.template.spec.containers[].env[?(@.name=="HEAP_NEWSIZE")]}'
  

For more information on these property settings, see the Configuration property reference.

Use SSD storage for production deployments

For the Cassandra database, the hybrid runtime only supports using dynamically created persistent volumes to store data. Local solid state disk (SSD) drives are not supported.

If you do not currently have SSD configured for Cassandra, you must configure a StorageClass definition that is backed by a solid-state drive (SSD) and make it the default class. See StorageClass configuration for detailed steps.

Follow the instructions in StorageClass configuration if you want to change the default Kubernetes StorageClass.