GKE best practices: Designing and building highly available clusters
Kobi Magnezi
Product Manager, Google Kubernetes Engine
Like many organizations, you employ a variety of risk management and risk mitigation strategies to keep your systems running, including your Google Kubernetes Engine (GKE) environment. These strategies ensure business continuity during both predictable and unpredictable outages, and they are especially important now, when you are working to limit the impact of the pandemic on your business.
In this first of two blog posts, we’ll provide recommendations and best practices for how to set up your GKE clusters for increased availability, on so-called day 0. Then, stay tuned for a second post, which describes high availability best practices for day 2, once your clusters are up and running.
When thinking about the high availability of GKE clusters, day 0 is often overlooked because many people think about disruptions and maintenance as being part of ongoing day 2 operations. In fact, it is necessary to carefully plan the topology and configuration of your GKE cluster before you deploy your workloads.
Choosing the right topology, scale, and health checks for your workloads
Before you create your GKE environment and deploy your workloads, you need to decide on some important design points.
Pick the right topology for your cluster
GKE offers two types of clusters: regional and zonal. In a zonal cluster topology, a cluster's control plane and nodes all run in a single compute zone that you specify when you create the cluster. In a regional cluster, the control plane and nodes are replicated across multiple zones within a single region.
Regional clusters consist of a three Kubernetes control planes quorum, offering higher availability than a zonal cluster can provide for your cluster’s control plane API. And although existing workloads running on the nodes aren’t impacted if a control plane(s) is unavailable, some applications are highly dependent on the availability of the cluster API. For those workloads, you’re better off using a regional cluster topology.
Of course, selecting a regional cluster isn’t enough to protect a GKE cluster either: scaling, scheduling, and replacing pods are the responsibilities of the control plane, and if the control plane is unavailable, that can impact your cluster’s reliability, which can only resume once the control plane becomes available again.
You should also remember that regional clusters have redundant control planes as well as nodes. In a regional topology, nodes are redundant across different zones, which can cause costly cross-zone network traffic.
Finally, although regional cluster autoscaling makes a best effort to spread resources among the three zones, it does not rebalance them automatically unless a scale up/down action occurs.
To summarize, for higher availability of the Kubernetes API, and to minimize disruption to the cluster during maintenance on the control plane, we recommend that you set up a regional cluster with nodes deployed in three different availability zones—and that you pay attention to autoscaling.
Scale horizontally and vertically
Capacity planning is important, but you can’t predict everything. To ensure that your workloads operate properly at times of peak load—and to control costs at times of normal or low load—we recommend exploring GKE’s autoscaling capabilities that best fit your needs.
Enable Cluster Autoscaler to automatically resize your nodepool size based on demand.
Use Horizontal Pod Autoscaling to automatically increase or decrease the number of pods based on utilization metrics.
Use Vertical Pod Autoscaling (VPA) in conjunction with Node Auto Provisioning (NAP a.k.a., Nodepool Auto Provisioning) to allow GKE to efficiently scale your cluster both horizontally (pods) and vertically (nodes).VPA automatically sets values for CPU, memory requests, and limits for your containers. NAP automatically manages node pools, and removes the default constraint of starting new nodes only from the set of user created node pools.
The above recommendations optimize for cost. NAP, for instance, reduces costs by taking down nodes during underutilized periods. But perhaps you care less about cost and more about latency and availability—in this case, you may want to create a large cluster from the get-go and use GCP reservations to guarantee your desired capacity. However, this is likely a more costly approach.
Review your default monitoring settings
Kubernetes is great at observing the behavior of your workloads and ensuring that load is evenly distributed out of the box. Then, you can further optimize workload availability by exposing specific signals from your workload to Kubernetes. These signals, Readiness and Liveness signals, provide Kubernetes additional information regarding your workload, helping it determine whether it is working properly and ready to receive traffic. Let’s examine the differences between readiness and liveness probes.
Every application behaves differently: some may take longer to initiate than others; some are batch processes that run for longer periods and may mistakenly seem unavailable. Readiness and liveness probes are designed exactly for this purpose—to let Kubernetes know the workloads’ acceptable behavior. For example, an application might take a long time to start, and during that time, you don’t want Kubernetes to start sending customer traffic to it, since it’s not yet ready to serve traffic yet. With a readiness probe, you can provide an accurate signal to Kubernetes for when an application has completed its initialization and is ready to serve your end users.
Make sure you set up readiness probes to ensure Kubernetes knows when your workload is really ready to accept traffic. Likewise, setting up a liveness probe tells Kubernetes when a workload is actually unresponsive or just busy performing CPU-intensive work.
Finally, readiness and liveness probes are only as good as they are defined and coded. Make sure you test and validate any probes that you create.
Correctly set up your deployment
Each application has a different set of characteristics. Some are batch workloads, some are based on stateless microservices, some on stateful databases. To ensure Kubernetes is aware of your application constraints, you can use Kubernetes Deployments to manage your workloads. A Deployment describes the desired state, and works with the Kubernetes schedule to change the actual state to meet the desired state.
Is your application stateful or not?
If your application needs to save its state between sessions, e.g., a database, then consider using StatefulSet, a Kubernetes controller that manages and maintains one or more Pods in a way that properly handles the unique characteristics of stateful applications. It is similar to other Kubernetes controllers that manage pods like ReplicaSets and Deployments. But unlike Deployments, Statefulset does not assume that Pods are interchangeable.
To maintain a state, StatefulSet also needs Persistent Volumes so that the hosted application can save and restore data across restarts. Kubernetes provides Storage Classes, Persistent Volumes, and Persistent Volume Claims as an abstraction layer above Cloud Storage.
Understanding Pod affinity
Do you want all replicas to be scheduled on the same node? What would happen if that node were to fail? Would it be ok to lose all replicas at once? You can control the placement of your Pod and any of its replicas using Kubernetes Pod affinity and anti-affinity rules.
To avoid a single point of failure, use Pod anti-affinity to instruct Kubernetes NOT to co-locate Pods on the same node. For a stateful application, this can be a crucial configuration, especially if it requires a minimum number of replicas (i.e., a quorum) to run properly.
For example, Apache ZooKeeper needs a quorum of servers to successfully commit mutations to data. For a three-server ensemble, two servers must be healthy for writes to succeed. Therefore, a resilient deployment must ensure that servers are deployed across failure domains.
Thus, to avoid an outage due to the loss of a node, we recommend you preclude co-locating multiple instances of an application on the same machine. You can do this by using Pod anti-affinity.
On the flip side, sometimes you want a group of Pods to be located on the same node, benefitting from their proximity and therefore from less latency and better performance when communicating with one another. You can achieve this using Pod affinity.
For example, Redis, another stateful application, may be providing in-memory cache for your web application. In this deployment, you would want the web server to be co-located with the cache as much as possible to avoid latency and boost performance.
Anticipate disruptions
Once you’ve configured your GKE cluster and the applications running on it, it’s time to think about how you will respond in the event of increased load or a disruption.
Going all digital requires better capacity planning
Running your Kubernetes clusters on GKE frees you up from thinking about physical infrastructure and how to scale it. Nonetheless, performing capacity planning is highly recommended, especially if you think you might get increased load.
Consider using reserved instances to guarantee any anticipated burst in resources demand. GKE supports specific (machine type and specification) and non-specific reservations. Once the reservation is set, nodes will automatically consume the reservations in the background from a pool of resources reserved uniquely for you.
Make sure you have a support plan
Google Cloud Support is a team of engineers around the globe working 24x7 to help you with any issues you may encounter. Now, before you’re up and running and in production, is a great time to make sure that you’ve secured the right Cloud Support plan to help you in the event of a problem.
Review your support plan to make sure you have the right package for your business.
Review your support user configurations to make sure your team members can open support cases.
Make sure you have GKE Monitoring and Logging enabled on your cluster; your technical support engineer will need these logs and metrics to troubleshoot your system.
If you do not have GKE Monitoring and Logging enabled, consider enabling the new beta system-only logs feature to collect only logs that are critical for troubleshooting.
Bringing it all together
Containerized applications are portable, easy to deploy and scale. GKE, with its wide range of cluster management capabilities, makes it even easier to run your workloads hassle-free. You know your application best, but by following these recommendations, you can drastically improve the availability and resilience of your clusters. Have more ideas or recommendations? Let us know! And stay tuned for part two of this series, where we talk about how to respond to issues in production clusters.