Set up topology domains

This page provides an overview of topology domains and guidelines for setting them up.

Setting up a topology domain requires that you enable advanced cluster. Note the following limitations with the advanced cluster preview:

  • You can enable advanced cluster at cluster creation time for new 1.31 clusters only.
  • After advanced cluster is enabled, you won't be able to upgrade the cluster to 1.32. Only enable advanced cluster in a test environment.

This page is for Admins and architects who define IT solutions and system architecture in accordance with company strategy, and create and manage policies related to user permissions. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.

Overview

A topology domain is a group of cluster nodes that are considered to be part of the same logical or physical grouping such as a campus or data center. A topology domain should correspond to some underlying hardware or software that has some possibility of correlated failure. For example:

  • Software failure such as different vCenter Servers
  • Hardware failure such as different racks, different power sources, and different buildings

In Google Distributed Cloud (software only) for VMware, as part of setting up a topology domain when you create a cluster, you define a topology label. After cluster creation, the topology label is populated to labels of nodes in the topology domain.

To make use of a topology domain during the preview, you have the following options:

  • Use the Kubernetes cluster-level default constraint, "topology.kubernetes.io/zone", as the key in the topology label. For more information, see Built-in default constraints.

  • Configure the PodTemplate in your Deployment, StatefulSet, or ReplicaSet, as applicable with the topology label key. In the Pod spec, you use the key in the topology label as the value for the topologySpreadConstraints.topologyKey field. This key lets the Kubernetes scheduler distribute Pods across the topology domain to ensure high availability and prevent over-concentration in any single area in case of failure. For more information on configuring topologySpreadConstraints in your Pod spec, see Pod Topology Spread Constraints in the Kubernetes documentation.

Example topology domain labels

Suppose you create the following three topology domains when creating a user cluster:

...
topologyDomains:
- name: "topology-domain-1"
  topologyLabels:
    "topology.examplepetstore.com/zone": "zone-1"
...
...
topologyDomains:
- name: "topology-domain-2"
  topologyLabels:
    "topology.examplepetstore.com/zone": "zone-2"
...
...
topologyDomains:
- name: "topology-domain-3"
  topologyLabels:
    "topology.examplepetstore.com/zone": "zone-3"
...

After the cluster is created, you update the Pod spec, for example:

...
topologySpreadConstraints:
  topologyKey: "topology.examplepetstore.com/zone"
...

At a high level, the Kubernetes scheduler uses topology.examplepetstore.com/zone to separate the cluster nodes into different groups, zone-1, zone-2, and zone-3. Then the scheduler spreads the Pods across these three node groups.

Guidelines for Topology Domains Setup

To ensure effective use of all clusters resources by the Kubernetes scheduler, we recommend the following guidelines:

  • The topology domains need to be balanced. You should provide nearly equal amounts of CPU and RAM capacity in each topology domain.
  • Provide at least two and preferably three topology domains.
  • Don't spread by more than one topology key.
  • The nodes should have a similar size in each topology domain.
  • If you use taints and tolerations for workload separation within a cluster, then each node group should meet the previous requirements.

If these guidelines aren't met, then the scheduler will still try to use the full capacity of the cluster, but it might take longer to schedule Pods, and not all Pods will get the expected spreading behavior.

What's next