This topic describes minimum cluster configurations for Apigee hybrid. These minimum configurations apply to all of the supported Kubernetes platforms. The recommendations in this topic apply for non-production installations, such as trial or testing scenarios. Keep these recommendations in mind when performing the Apigee hybrid installation steps..
About node pools
A node pool is a group of nodes within a cluster that all have the same configuration. By default, hybrid assigns all pods to the default node pool; however, you can create dedicated node pools and assign hybrid components to them as a way of distributing resources.
Typically, you define dedicated node pools when you have pods with differing resource
requirements. For example, the apigee-cassandra
pods require persistent storage, while
the other Apigee hybrid pods do not. For this reason, we recommend that you create
a stateful node pool for Cassandra and a stateless node pool for the rest of the hybrid
runtime services. See Configure dedicated node pools for
details.
The following section lists configurations for both stateful and stateless node pools.
Minimum configurations
Use these minimum configurations when setting up your cluster:
Configuration | Stateful node pool | Stateless node pool |
---|---|---|
Purpose | A stateful node pool used for the Cassandra database. | A stateless node pool used by the runtime message processor. |
Label name | apigee-data | apigee-runtime |
Number of nodes | 1 per zone (3 per region) | 1 per zone (3 per region) |
CPU | 4 | 4 |
RAM | 15 | 15 |
Storage | dynamic | Managed with the ApigeeDeployment CRD |
Minimum disk IOPS | 2000 IOPS with SAN or directly attached storage. NFS is not recommended even if it can support the required IOPS. | 2000 IOPS with SAN or directly attached storage. NFS is not recommended even if it can support the required IOPS. |
Cassandra network requirements
Cassandra uses the Gossip protocol to exchange information with other nodes about network topology.
The use of Gossip plus the distributed nature of Cassandra—which involves talking to multiple nodes for read and write operations—results in a lot of data transfer through the network.
Apigee recommends using instance type with a minimum 1 Gbps network bandwidth and more than 1 Gbps for production systems.
Cassandra clusters need three availability zones to maintain availability in a production environment. If one zone goes down, the remaining zones will continue responding to requests while the remaining zone comes back online. If two or more zones go down, Cassandra will be unable to respond to requests until at least two zones come back online. Apigee recommends bringing zones back online within three hours to minimize the risk of missing data updates.
When deploying multi-region hybrid environments, Apigee recommends using a VPN or cloud solution like Google Cloud VPN to secure connectivity between the regions. Make sure there are no overlapping subnets as these may cause Cassandra connectivity issues. Ensure the current firewall configurations allow for Cassandra traffic to pass between Cassandra pods. See Secure ports usage for information on Cassandra ports.
The maximum or 99th percentile latency for Cassandra should be below 100 milliseconds.
Cassandra NTP requirements
Cassandra data synchronizes based on the timestamp of the system. Ensure that the time is synchronized across all pods and all regions within the Cassandra cluster. Time delays between the nodes and regions causes data inconsistencies.
Scaling the configuration
If you need to scale your initial configuration based on additional capacity or throughput needs, see the following topics:
- Configuring Cassandra for production
- Scaling Cassandra pods
- Configuring dedicated node pools
- Scale and autoscale runtime services