Scaling Cassandra

This topic discusses how to scale up Cassandra horizontally and vertically, and how to scale down Cassandra.

Scaling Cassandra horizontally

To scale up Cassandra horizontally

Make sure that your apigee-data node pool has additional capacity, as needed, before scaling Cassandra. See also Configure dedicated node pools.
Set the value of the cassandra.replicaCount configuration property in your overrides file. For information about this property, see the Configuration property reference. See also Manage runtime plane components.
Note: The default replication factor for all keyspaces is three. As a result, we recommend that you scale the replicas by a factor of three.

Apply the changes. For example:

$APIGEE_HOME/apigeectl apply --datastore -f overrides/overrides.yaml

Scaling Cassandra vertically

This section explains how to scale the Cassandra pods vertically to accommodate higher CPU and memory requirements.

Overview

For an Apigee hybrid production deployment, we recommend that you create at least two separate node pools: one for stateful services (Cassandra) and one for stateless (runtime) services. For example, see GKE production cluster requirements.

For the stateful Cassandra node pool, we recommend starting with 8 CPU cores and 30 GB of memory. Once the node pool is provisioned, these settings cannot be changed. See also Configure Cassandra for production.

If you need to scale up the Cassandra pods to accommodate higher CPU and memory requirements, follow the steps described in this topic.

Scaling up the Cassandra pods

Follow these steps to increase the CPU and memory for the stateful node pool used for Cassandra:

Follow your Kubernetes platform's instructions to add a new node pool to the cluster. Supported platforms are listed in the installation instructions.

Verify that the new node pool is ready:

kubectl get nodes -l NODE_POOL_LABEL_NAME=NODE_POOL_LABEL_VALUE

Example command:

kubectl get nodes -l cloud.google.com/gke-nodepool=apigee-data-new

Example output:

NAME                                                STATUS   ROLES    AGE     VERSION
gke-apigee-data-new-441387c2-2h5n   Ready    <none>   4m28s   v1.14.10-gke.17
gke-apigee-data-new-441387c2-6941   Ready    <none>   4m28s   v1.14.10-gke.17
gke-apigee-data-new-441387c2-nhgc   Ready    <none>   4m29s   v1.14.10-gke.17

Update your overrides file to use the new node pool for Cassandra and update the pod resources to the increased CPU count and memory size that you wish to use. For example, for a GKE cluster, use a configuration similar to the following. If you are on another Kubernetes platform, you need to adjust the apigeeData.key value accordingly:

nodeSelector:
  requiredForScheduling: true
  apigeeData:
    key: "NODE_POOL_LABEL_NAME"
    value: "NODE_POOL_LABEL_VALUE"

cassandra:
  resources:
    requests:
      cpu: NODE_POOL_CPU_NUMBER
      memory: NODE_POOL_MEMORY_SIZE

For example:

nodeSelector:
  requiredForScheduling: true
  apigeeData:
    key: "cloud.google.com/gke-nodepool"
    value: "apigee-data-new"

cassandra:
  resources:
    requests:
      cpu: 14
      memory: 16Gi

Apply the overrides file to the cluster:

$APIGEECTL_HOME/apigeectl apply -f ./overrides/overrides.yaml --datastore

When you complete these steps, the Cassandra pods will begin rolling over to the new node pool.

Scaling down Cassandra

Apigee hybrid employs a ring of Cassandra nodes as a StatefulSet. Cassandra provides persistent storage for certain Apigee entities on the runtime plane. For more information about Cassandra, see About the runtime plane.

Cassandra is a resource intensive service and should not be deployed on a pod with any other hybrid services. Depending on the load, you might want to scale the number of Cassandra nodes in the ring down in your cluster.

The general process for scaling down a Cassandra ring is:

Decommission one Cassandra node.
Update the cassandra.replicaCount property in overrides.yaml.
Apply the configuration update.
Repeat these steps for each node you want remove.
Delete the persistent volume claim or volume, depending on your cluster configuration.

What you need to know

Perform this task on one node at a time before proceeding to the next node.
If any node other than the node to be decommissioned is unhealthy, do not proceed. Kubernetes will not be able to downscale the pods from the cluster.
Always scale down or up by a factor of three nodes.

Prerequisites

Before you scale down the number of Cassandra nodes in the ring, validate if the cluster is healthy and all the nodes are up and running, as the following example shows:

 kubectl get pods -n yourNamespace -l app=apigee-cassandra
NAME                 READY   STATUS    RESTARTS   AGE
apigee-cassandra-default-0   1/1     Running   0          2h
apigee-cassandra-default-1   1/1     Running   0          2h
apigee-cassandra-default-2   1/1     Running   0          2h
apigee-cassandra-default-3   1/1     Running   0          16m
apigee-cassandra-default-4   1/1     Running   0          14m
apigee-cassandra-default-5   1/1     Running   0          13m

kubectl -n yourNamespace exec -it apigee-cassandra-default-0 nodetool status
Datacenter: us-east1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.16.2.6    690.17 KiB  256          48.8%             b02089d1-0521-42e1-bbed-900656a58b68  ra-1
UN  10.16.4.6    700.55 KiB  256          51.6%             dc6b7faf-6866-4044-9ac9-1269ebd85dab  ra-1 to
UN  10.16.11.11  144.36 KiB  256          48.3%             c7906366-6c98-4ff6-a4fd-17c596c33cf7  ra-1
UN  10.16.1.11   767.03 KiB  256          49.8%             ddf221aa-80aa-497d-b73f-67e576ff1a23  ra-1
UN  10.16.5.13   193.64 KiB  256          50.9%             2f01ac42-4b6a-4f9e-a4eb-4734c24def95  ra-1
UN  10.16.8.15   132.42 KiB  256          50.6%             a27f93af-f8a0-4c88-839f-2d653596efc2  ra-1

Decommission the Cassandra nodes

Decommission the Cassandra nodes from the cluster using the nodetool command.
Always decommission the nodes with higher numbers in the pod name first. For example in a 6 node cluster start with the apigee-cassandra-5 Cassandra node.
```
kubectl -n yourNamespace exec -it nodeName nodetool decommission
```
For example, this command decommissions apigee-cassandra-5, the node with the highest number value in the name:
```
kubectl -n apigee exec -it apigee-cassandra-5 nodetool decommission
```

Wait for the decommission to complete, and verify that the cluster now has one less node. For example:

kubectl -n yourNamespace exec -it nodeName nodetool status
Datacenter: us-east1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.16.2.6   710.37 KiB  256          59.0%             b02089d1-0521-42e1-bbed-900656a58b68  ra-1
UN  10.16.4.6   720.97 KiB  256          61.3%             dc6b7faf-6866-4044-9ac9-1269ebd85dab  ra-1
UN  10.16.1.11  777.11 KiB  256          58.9%             ddf221aa-80aa-497d-b73f-67e576ff1a23  ra-1
UN  10.16.5.13  209.23 KiB  256          62.2%             2f01ac42-4b6a-4f9e-a4eb-4734c24def95  ra-1
UN  10.16.8.15  143.23 KiB  256          58.6%             a27f93af-f8a0-4c88-839f-2d653596efc2  ra-1

Update or add the cassandra.replicaCount property in your overrides.yaml file. For example, if the current node count is 6, change it to 5:
```
cassandra:
  replicaCount: 5 # (n-1 5 in this example)
```
Apply the configuration change to your cluster:
```
./apigeectl apply --datastore
namespace/apigee unchanged
secret/ssl-cassandra unchanged
storageclass.storage.k8s.io/apigee-gcepd unchanged
service/apigee-cassandra unchanged
statefulset.apps/apigee-cassandra configured
```
Note: you must update and apply the cassandra.replicaCount property to n-1 after decommissioning each node. For example if you are scaling down from six nodes to three, you will need to change and apply cassandra.replicaCount three times.

Verify that all of the remaining Cassandra nodes are running:

kubectl get pods -n yourNamespace -l app=apigee-cassandra
NAME                 READY   STATUS    RESTARTS   AGE
apigee-cassandra-default-0   1/1     Running   0          3h
apigee-cassandra-default-1   1/1     Running   0          3h
apigee-cassandra-default-2   1/1     Running   0          2h
apigee-cassandra-default-3   1/1     Running   0          25m
apigee-cassandra-default-4   1/1     Running   0          24m

Repeat Steps 1-5 for each node that you wish to decommission.

When you are finished decommissioning nodes, verify that the cassandra.replicaCount value equals the number of nodes returned by the nodetool status command.

For example, if you scaled Cassandra down to three nodes:

kubectl -n yourNamespace exec -it apigee-cassandra-default-0 nodetool status
Datacenter: us-east1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.16.2.6   710.37 KiB  256          59.0%             b02089d1-0521-42e1-bbed-900656a58b68  ra-1
UN  10.16.4.6   720.97 KiB  256          61.3%             dc6b7faf-6866-4044-9ac9-1269ebd85dab  ra-1
UN  10.16.5.13  209.23 KiB  256          62.2%             2f01ac42-4b6a-4f9e-a4eb-4734c24def95  ra-1

After the Cassandra cluster is downsized, make sure to delete the pvc (PersistentVolumeClaim) to make sure next scale up event does not use the same Persistent volume and the data created earlier.

Get the names of the pvcs:

kubectl get pvc -n yourNamespace
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-apigee-cassandra-default-0   Bound    pvc-f9c2a5b9-818c-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-1   Bound    pvc-2956cb78-818d-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-2   Bound    pvc-79de5407-8190-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-3   Bound    pvc-d29ba265-81a2-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   5h
cassandra-data-apigee-cassandra-default-4   Bound    pvc-0675a0ff-81a3-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   5h
cassandra-data-apigee-cassandra-default-5   Bound    pvc-354afa95-81a3-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   5h

In this example the following pvcs correspond with the three decommissioned nodes:

cassandra-data-apigee-cassandra-5
cassandra-data-apigee-cassandra-4
cassandra-data-apigee-cassandra-3

Delete the pvcs:

kubectl -n yourNamespace delete pvc cassandra-data-apigee-cassandra-5
persistentvolumeclaim "cassandra-data-apigee-cassandra-5" deleted
kubectl -n yourNamespace delete pvc cassandra-data-apigee-cassandra-4
persistentvolumeclaim "cassandra-data-apigee-cassandra-4" deleted
kubectl -n yourNamespace delete pvc cassandra-data-apigee-cassandra-3
persistentvolumeclaim "cassandra-data-apigee-cassandra-3" deleted

Verify that the pvc was deleted:

kubectl get pvc -n yourNamespace
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-apigee-cassandra-default-0   Bound    pvc-f9c2a5b9-818c-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-1   Bound    pvc-2956cb78-818d-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-2   Bound    pvc-79de5407-8190-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   7h
cassandra-data-apigee-cassandra-default-3   Bound    pvc-d29ba265-81a2-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   5h
cassandra-data-apigee-cassandra-default-4   Bound    pvc-0675a0ff-81a3-11e9-8862-42010a8e014a   100Gi      RWO            apigee-gcepd   5h

If you are using Anthos installation, delete the Persistent volume from Anthos Kubernetes cluster using the same sequence.

Get the names of the persistent volumes:

kubectl get pv -n youNamespace
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      STORAGECLASS   REASON   AGE
pvc-0675a0ff-81a3-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-4   apigee-gcepd            5h
pvc-2956cb78-818d-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-1   apigee-gcepd            7h
pvc-354afa95-81a3-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-5   apigee-gcepd            5h
pvc-79de5407-8190-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-2   apigee-gcepd            7h
pvc-d29ba265-81a2-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-3   apigee-gcepd            5h
pvc-f9c2a5b9-818c-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-0   apigee-gcepd            7h

In this example the following volumes correspond with the three decommissioned nodes:

5: pvc-354afa95-81a3-11e9-8862-42010a8e014a
4: pvc-0675a0ff-81a3-11e9-8862-42010a8e014a
3: pvc-d29ba265-81a2-11e9-8862-42010a8e014a

Delete the persistent volumes:

kubectl -n yourNamespace delete pv pvc-354afa95-81a3-11e9-8862-42010a8e014a
kubectl -n yourNamespace delete pv pvc-0675a0ff-81a3-11e9-8862-42010a8e014a
kubectl -n yourNamespace delete pv pvc-d29ba265-81a2-11e9-8862-42010a8e014a

Verify that the persistent volumes were deleted:

kubectl get pv -n youNamespace
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      STORAGECLASS   REASON   AGE
pvc-2956cb78-818d-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-1   apigee-gcepd            7h
pvc-79de5407-8190-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-2   apigee-gcepd            7h
pvc-f9c2a5b9-818c-11e9-8862-42010a8e014a   100Gi      RWO            Delete           Bound    apigee/cassandra-data-apigee-cassandra-default-0   apigee-gcepd            7h