This document describes best practices to achieve high availability (HA) with Red Hat OpenShift Container Platform workloads on Compute Engine. This document focuses on application-level strategies to help you ensure that your workloads remain highly available when failures occur. These strategies help you eliminate single points of failure and implement mechanisms for automatic failover and recovery.
This document is intended for platform and application architects and assumes that you have some experience in deploying OpenShift. For more information about how to deploy OpenShift, see the Red Hat documentation.
Spread deployments across multiple zones
We recommend that you deploy OpenShift across multiple zones within a
Google Cloud region. This approach helps ensure that if a zone experiences an
outage, the cluster's control plane nodes continue to function in the other
zones the deployment is spread across. To deploy OpenShift across multiple
zones, specify a list of Google Cloud zones from the same region in your
install-config.yaml
file.
For fine-grained control over the locations where nodes are deployed, we recommend defining VM placement policies which ensure that the VMs are spread across different failure domains in the same zone. Applying a spread placement policy to your cluster nodes helps reduce the number of nodes that are simultaneously impacted by location-specific disruptions. For more information on how to create a spread policy for existing clusters, see Create and apply spread placement policies to VMs.
Similarly, to prevent multiple pods from being scheduled on the same node, we recommend that you use pod anti-affinity rules. These rules spread application replicas across multiple zones. The following example demonstrates how to implement pod anti-affinity rules:
apiVersion: apps/v1 kind: Deployment metadata: name: my-app namespace: my-app-namespace spec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: # Pod Anti-Affinity: Prefer to schedule new pods on nodes in different zones. affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app: my-app topologyKey: topology.kubernetes.io/zone containers: - name: my-app-container image: quay.io/myorg/my-app:latest ports: - containerPort: 8080
For stateless services like web front ends or REST APIs, we recommend that you run multiple pod replicas for each service or route. This approach ensures that traffic is automatically routed to pods in available zones.
Proactively manage load to prevent resource over-commitment
We recommend that you proactively manage your application's load to prevent resource over-commitment. Over-commitment can lead to poor service performance under load. You can help prevent over-commitment by setting resource request limits, for a more detailed explanation see managing resources for your pod. Additionally, you can automatically scale replicas up or down based on CPU, memory, or custom metrics, using the horizontal pod autoscaler.
We also recommend that you use the following load balancing services:
- OpenShift ingress operator. Ingress operator deploys HAProxy-based ingress controllers to handle routing to your pods. Specifically, we recommend that you configure global access for Ingress controller, which enables clients in any region within the same VPC network and region as the load balancer, to reach the workloads running on your cluster. Additionally, we recommend that you implement ingress controller health checks to monitor the health of your pods and restart failing pods.
- Google Cloud Load Balancing. Load Balancing distributes traffic across Google Cloud zones. Choose a load balancer that meets your application's needs.
Define pod disruption budgets
We recommend that you define disruption budgets to specify the minimum number of pods that your application requires to be available during disruptions like maintenance events or updates. The following example shows how to define a disruption budget:
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb namespace: my-app-namespace spec: # Define how many pods need to remain available during a disruption. # At least one of "minAvailable" or "maxUnavailable" must be specified. minAvailable: 2 selector: matchLabels: app: my-app
For more information, see Specifying a Disruption Budget for your Application.
Use storage that supports HA and data replication
For stateful workloads that require persistent data storage outside of containers, we recommend the following best practices.
Disk best practices
If you require disk storage use one of the following:
- Block storage: Compute Engine regional Persistent Disk with synchronous replication
- Shared file storage: Filestore with snapshots and backups enabled
After you select a storage option, install its driver in your cluster:
Finally, set a StorageClass
for your disk:
The following example shows how to set a StorageClass:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: regionalpd-balanced provisioner: PROVISIONER parameters: type: DISK-TYPE replication-type: REPLICATION-TYPE volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete allowVolumeExpansion: true allowedTopologies: - matchLabelExpressions: - key: topology.kubernetes.io/zone values: - europe-west1-b - europe-west1-a
Database best practices
If you require a database use one of the following:
- Fully-managed database: We recommend that you use Cloud SQL or AlloyDB for PostgreSQL to manage database HA on your behalf. If you use Cloud SQL, you can use the Cloud SQL Proxy Operator to simplify connection management between your application and the database.
- Self-managed database: We recommend that you use a database that supports HA and that you deploy its operator to enable HA. For more information, see the documentation related to your database operator, such as Redis Enterprise for Kubernetes, MariaDB Operator, or CloudNative PostgreSQL Operator.
After you install your database operator, configure a cluster with multiple instances. The following example shows the configuration for a cluster with the following attributes:
- A PostgreSQL cluster named
my-postgres-cluster
is created with three instances for high availability. - The cluster uses the
regionalpd-balanced
storage class for durable and replicated storage across zones. - A database named
mydatabase
is initialized with a usermyuser
, whose credentials are stored in a Kubernetes secret calledmy-database-secret
. - Superuser access is disabled for enhanced security.
- Monitoring is enabled for the cluster.
apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: my-postgres-cluster namespace: postgres-namespace spec: instances: 3 storage: size: 10Gi storageClass: regionalpd-balanced bootstrap: initdb: database: mydatabase owner: myuser secret: name: my-database-secret enableSuperuserAccess: false monitoring: enabled: true --- apiVersion: 1 kind: Secret metadata: name: my-database-secret namespace: postgres-namespace type: Opaque data: username: bXl1c2Vy # Base64-encoded value of "myuser" password: c2VjdXJlcGFzc3dvcmQ= # Base64-encoded value of "securepassword"
Externalize application state
We recommend that you move session state or caching to shared in-memory stores (for example, Redis) or persistent datastores (for example, Postgres, MySQL) that are configured to run in HA mode.
Summary of best practices
In summary, implement the following best practices to achieve high availability with OpenShift:
- Spread deployments across multiple zones
- Proactively manage load to prevent resource over-commitment
- Define pod disruption budgets
- Use HA data replication features
- Externalize application state
What's next
- Learn how to install OpenShift on Google Cloud
- Learn more about Red Hat solutions on Google Cloud