High availability

This page describes your high availability (HA) options in GKE on-prem.

For more comprehensive information that combines the features of GKE on-prem, vCenter, and vMotion to provide high availability and disaster recovery, see High availability and disaster recovery.

HA for user clusters

GKE on-prem architecture with highly-available user clusters
GKE on-prem architecture with highly-available user clusters (Click to enlarge)

GKE on-prem supports HA user control planes. During cluster creation, you can choose to create three user control planes. To do so, specify usercluster.masternode.replicas: 3 in the GKE on-prem configuration file you're using to create the user cluster.

To create a HA user cluster, GKE on-prem creates three user control plane VMs (or master nodes) within the admin cluster. Each control plane VM runs the same Kubernetes control plane components.

HA for admin clusters and user cluster control planes

GKE on-prem automatically uses the VMware Distributed Resource Scheduler (DRS) to create anti-affinity rules for user cluster nodes. This means that VMs in a user cluster node pool get spread across at least three physical hosts.

Starting with version 1.5, GKE on-prem also creates VMware DRS anti-affinity rules for admin cluster nodes. Because the control plane VMs for the user cluster are in the admin cluster, the control plane VMs for an HA user cluster get spread across three physical hosts. Also, admin add-on nodes get spread across two physical hosts.

For GKE on-prem to enable DRS anti-affinity rules, you must have at least three physical hosts in your vCenter cluster. We recommend that you configure your vCenter cluster with more than three physical hosts and enable vsphere HA for extra redundancy in case a physical host fails.

If you don't have enough physical hosts in your vCenter cluster, you can disable DRS anti-affinity rules in your admin cluster or user cluster as follows:

  • For a v1 admin or user cluster configuration file, set antiAffinityGroups.enabled to false:

    antiAffinityGroups:
      enabled: false
    
  • For a v0 cluster configuration file, set admincluster.antiaffinitygroups.enabled and usercluster.antiaffinitygroups.enabled to false:

    admincluster:
      antiaffinitygroups:
        enabled: false
    ...
    usercluster:
      antiaffinitygroups:
        enabled: false
    

Protection for the admin cluster control plane

GKE on-prem does not support HA admin control planes. You can prevent a single point of failure in the admin cluster by enabling vSphere High Availability, which protects GKE on-prem admin clusters from going down in the event of an underlying host failure. To learn more, see Create a vSphere HA Cluster.