Regional clusters


This page explains how regional clusters work in Google Kubernetes Engine (GKE). You can create a regional cluster or learn more about the different types of clusters.

Overview

In contrast to zonal clusters that have a single control plane in a single zone, regional clusters increase the availability of both a cluster's control plane and its nodes by replicating them across multiple zones in a region. This provides the advantages of multi-zonal clusters, with the following additional benefits:

  • If one zone in a region experiences an outage, the cluster's control plane remains accessible as long as two replicas of the control plane remain available.
  • During cluster maintenance such as a cluster upgrade, only one replica of the control plane is unavailable at a time, and the cluster is still operational.

The control plane is replicated across three zones of a region. For node pools, you can manually specify the zone(s) in which the cluster's node pools run or you can use the default configuration, which replicates each node pool across three zones of the control plane's region. All zones must be within the same region as the cluster's control plane.

Use regional clusters to run your production workloads, as they offer higher availability than zonal clusters.

After creating a regional cluster, you cannot change it to a zonal cluster.

How regional clusters work

Regional clusters replicate the cluster's control plane and nodes across multiple zones within a single region. For example, using the default configuration, a regional cluster in the us-east1 region creates replicas of the control plane and nodes in three us-east1 zones: us-east1-b, us-east1-c, and us-east1-d. In the event of an infrastructure outage, your workloads continue to run, and nodes can be rebalanced manually or by using the cluster autoscaler.

Benefits of using regional clusters include the following:

  • Resilience from single zone failure: Regional clusters are available across a region rather than a single zone within a region. If a single zone becomes unavailable, your control plane and your resources are not impacted.
  • Continuous control plane upgrades, control plane resizes, and reduced downtime from control plane failures. With redundant replicas of the control plane, regional clusters provide higher availability of the Kubernetes API, so you can access your control plane even during upgrades.

Limitations

  • By default, regional clusters consist of nine nodes (three per zone) spread evenly across three zones in a region. This consumes nine IP addresses. You can reduce the number of nodes down to one per zone, if desired. Newly created Cloud Billing accounts are granted only eight IP addresses per region, so you may need to request an increase in your quotas for regional in-use IP addresses, depending on the size of your regional cluster. If you have too few available in-use IP addresses, cluster creation fails.

  • To run GPUs in your regional cluster, choose a region with three zones where GPUs are available. You can also specify zones using the --node-locations flag when creating the cluster.

    If the region you choose doesn't have three zones where GPUs are available, you might see an error like the following:

    ERROR: (gcloud.container.clusters.create) ResponseError: code=400, message=
        (1) accelerator type "nvidia-tesla-k80" does not exist in zone us-west1-c.
        (2) accelerator type "nvidia-tesla-k80" does not exist in zone us-west1-a.
    

    For a complete list of regions and zones where GPUs are available, refer to GPUs on Compute Engine.

  • Zones for node pools must be in the same region as the cluster's control plane. If you need to, you can change a cluster's zones, which causes all new and existing nodes to span those zones.

Pricing

Using regional clusters requires more of your project's regional quotas than a similar zonal or multi-zonal cluster. Ensure that you understand your quotas and GKE pricing before using regional clusters. If you encounter an Insufficient regional quota to satisfy request for resource error, your request exceeds your available quota in the current region.

Also, you are charged for node-to-node traffic across zones. For example, if a workload running in one zone needs to communicate with a workload in a different zone, the cross-zone traffic incurs cost. For more information, see Egress between zones in the same region (per GB) in the Compute Engine pricing page.

Persistent storage in regional clusters

Zonal persistent disks are zonal resources and regional persistent disks are multi-zonal resources. When adding persistent storage unless a zone is specified, GKE assigns the disk to a single, random zone. To learn how to control the zones, see Zones in persistent disks.

Autoscaling regional clusters

Keep the following considerations in mind when using the cluster autoscaler to automatically scale node pools in regional clusters.

You can also learn more about Autoscaling limits for regional clusters or about how Cluster Autoscaler balances across zones.

Overprovisioning scaling limits

To maintain capacity in the unlikely event of zonal failure, you can allow GKE to overprovision your scaling limits, to guarantee a minimum level of availability even when some zones are unavailable.

For example, if you overprovision a three-zone cluster to 150% (50% excess capacity), you can ensure that 100% of traffic is routed to available zones if one-third of the cluster's capacity is lost. In the preceding example, you would accomplish this by specifying a maximum of six nodes per zone rather than four. If one zone fails, the cluster scales to 12 nodes in the remaining zones.

Similarly, if you overprovision a two-zone cluster to 200%, you can ensure that 100% of traffic is rerouted if half of the cluster's capacity is lost.

You can learn more about the cluster autoscaler or read the FAQ for autoscaling in the Kubernetes documentation.

What's next