Define compact placement for GKE nodes


You can control whether your Google Kubernetes Engine (GKE) nodes are physically located relative to each other within a zone by using a compact placement policy.

Overview

When you create node pools and workloads in a GKE cluster, you can define a compact placement policy, which specifies that these nodes or workloads should be placed in closer physical proximity to each other within a zone. Having nodes closer to each other can reduce network latency between nodes, which can be especially useful for tightly-coupled batch workloads.

Use compact placement with GKE Autopilot

Limitations

  • GKE provisions workloads within a compact placement in the same zone.
  • Compact placement is available on Balanced and A100 GPU. To learn more, see machine types.
  • Compact placement is available for Pods grouped on up to 150 nodes.
  • Live migration for nodes is not supported.

Enable a compact placement policy

To enable compact placement for GKE Autopilot, add a nodeSelector to the Pod specification with the following keys:

  • cloud.google.com/gke-placement-group is the identifier you assign for the group of Pods that should run together, in the same compact placement group.

  • One of the following keys to define the type of resource:

    • cloud.google.com/compute-class: "Balanced"
    • cloud.google.com/gke-accelerator: "nvidia-tesla-a100"

The following example is an excerpt of a Pod specification that enables compact placement. The placement group identifier is placement-group-1 and the compute class is Balanced:

  nodeSelector:
    cloud.google.com/gke-placement-group: "placement-group-1"
    cloud.google.com/compute-class: "Balanced"

Each placement group is limited to 150 nodes. We recommend you limit a placement group only to the workloads that benefit from the grouping, and distribute your workloads into separate placement groups where possible.

Use compact placement with GKE Standard

Limitations

Compact placement in GKE Standard node pools has the following limitations:

Create a compact placement policy

To create compact placement policies, in the Google Cloud CLI, you specify the placement-type=COMPACT option during node pool or cluster creation. With this setting, GKE attempts to place nodes within a node pool in closer physical proximity to each other.

To use an existing resource policy in your cluster, specify the location of your custom policy for the placement-policy flag during node pool or cluster creation. This enables the flexibility of using reserved placements, multiple node pools with the same placement policy, and other advanced placement options. However, it also requires more manual operations than specifying the --placement-type=COMPACT flag. For example, you need to create, delete, and maintain your custom resource policies. Make sure that the maximum number of VM instances is respected across all node pools using the resource policy. If this limit is reached while some of your node pools haven't reach their maximum size, adding any more nodes will fail.

If you don't specify the placement-type and placement-policy flags, then by default there are no requirements on node placement.

Create a compact placement policy in a new cluster

When you create a new cluster, you can specify a compact placement policy that will be applied to the default node pool. Any subsequent node pools that you create for the cluster, you will need to specify whether to apply compact placement.

To create a new cluster where the default node pool has a compact placement policy applied, use the following command:

gcloud container clusters create CLUSTER_NAME \
    --machine-type MACHINE_TYPE \
    --placement-type COMPACT \
    --max-surge-upgrade 0 \
    --max-unavailable-upgrade MAX_UNAVAILABLE

Replace the following:

  • CLUSTER_NAME: The name of your new cluster.
  • MACHINE_TYPE: The type of machine to use for nodes, which must be a C2 machine type (for example, c2-standard-4).
  • --placement-type COMPACT: Applies compact placement for the nodes in the default node pool.
  • MAX_UNAVAILABLE: Maximum number of nodes that can be unavailable at the same time during a node pool upgrade. For compact placement we recommend fast no surge upgrades to optimize the likelihood of finding colocated nodes during upgrades.

Create a compact placement policy on an existing cluster

On an existing cluster, you can create a node pool that has a compact placement policy applied.

To create a node pool that has a compact placement policy applied, use the following command:

gcloud container node-pools create NODEPOOL_NAME \
    --machine-type MACHINE_TYPE \
    --cluster CLUSTER_NAME \
    --placement-type COMPACT \
    --max-surge-upgrade 0 \
    --max-unavailable-upgrade MAX_UNAVAILABLE

Replace the following:

  • NODEPOOL_NAME: The name of your new node pool.
  • MACHINE_TYPE: The type of machine to use for nodes, which must be a C2 machine type (for example, c2-standard-4).
  • CLUSTER_NAME: The name of your existing cluster.
  • --placement-type COMPACT: Indicates to apply compact placement for the nodes in the new node pool.
  • MAX_UNAVAILABLE: Maximum number of nodes that can be unavailable at the same time during a node pool upgrade. For compact placement we recommend fast no surge upgrades to optimize the likelihood of finding colocated nodes during upgrades.

Create node pools using a shared custom placement policy

You can manually create a resource policy and use it in multiple node pools.

  1. Create the resource policy in the cluster Google Cloud region:

    gcloud compute resource-policies create group-placement POLICY_NAME \
        --region REGION \
        --collocation collocated
    

    Replace the following:

    • POLICY_NAME: The name of your resource policy.
    • REGION: The region of your cluster.
  2. Create a node pool using the custom resource policy:

    gcloud container node-pools create NODEPOOL_NAME \
        --machine-type MACHINE_TYPE \
        --cluster CLUSTER_NAME \
        --placement-policy POLICY_NAME \
        --max-surge-upgrade 0 \
        --max-unavailable-upgrade MAX_UNAVAILABLE
    

    Replace the following:

    • NODEPOOL_NAME: The name of your new node pool.
    • MACHINE_TYPE: The type of machine to use for nodes, which must be a C2 machine type (for example, c2-standard-4).
    • CLUSTER_NAME: The name of your existing cluster.
    • MAX_UNAVAILABLE: Maximum number of nodes that can be unavailable at the same time during a node pool upgrade. For compact placement we recommend fast no surge upgrades to optimize the likelihood of finding colocated nodes during upgrades.

Use a Compute Engine reservation with a compact placement policy

Reservations help you guarantee that hardware is available in a specified zone, reducing the risk of node pool creation failure caused by insufficient hardware.

  1. Create a reservation that specifies a compact placement policy:

    gcloud compute reservations create RESERVATION_NAME \
        --vm-count MACHINE_COUNT \
        --machine-type MACHINE_TYPE \
        --resource-policies policy=POLICY_NAME \
        --zone ZONE \
        --require-specific-reservation
    

    Replace the following:

    • RESERVATION_NAME: The name of your reservation.
    • MACHINE_COUNT: The number of reserved nodes.
    • MACHINE_TYPE: The type of machine to use for nodes, which must be a C2 machine type. For example, to use a predefined C2 machine type with 4 vCPUs, specify c2-standard-4.
    • POLICY_NAME: The name of your resource policy.
    • ZONE: The zone where to create your reservation.
  2. Create a node pool by specifying both the compact placement policy and the reservation you created in the previous step:

    gcloud container node-pools create NODEPOOL_NAME \
        --machine-type MACHINE_TYPE \
        --cluster CLUSTER_NAME \
        --placement-policy POLICY_NAME \
        --reservation-affinity specific \
        --reservation RESERVATION_NAME \
        --max-surge-upgrade 0 \
        --max-unavailable-upgrade MAX_UNAVAILABLE
    

Replace the following:

  • NODEPOOL_NAME: The name of your new node pool.
  • MACHINE_TYPE: The type of machine to use for nodes, which must be a C2 machine type (for example, c2-standard-4).
  • CLUSTER_NAME: The name of your existing cluster.

Create a workload on nodes that use compact placement

To run workloads on dedicated nodes that use compact placement, you can use several Kubernetes mechanisms, such as assigning pods to nodes and preventing scheduling unwanted pods on a group of nodes to achieve this.

In the following example, we add a taint to the dedicated nodes and add a corresponding toleration and affinity to the Pods.

  1. Add a taint to nodes in the node pool that has a compact placement policy:

    kubectl taint nodes -l cloud.google.com/gke-nodepool=NODEPOOL_NAME dedicated-pool=NODEPOOL_NAME:NoSchedule
    
  2. In the workload definition, specify the necessary toleration and a node affinity. Here's an example with a single Pod:

    apiVersion: v1
    kind: Pod
    metadata:
      ...
    spec:
      ...
      tolerations:
      - key: dedicated-pool
        operator: "Equal"
        value: "NODEPOOL_NAME"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: dedicated-pool
                operator: In
                values:
                - NODEPOOL_NAME
    

In some locations, it might not be possible to create a large node pool using a compact placement policy. To limit the size of such node pools to what's necessary, you should consider creating a node pool per workload requiring compact placement.

Use compact placement for node auto-provisioning

Starting in GKE version 1.25, node auto-provisioning supports compact placement policy. With node auto-provisioning, GKE automatically provisions node pools based on cluster resource demand. For more information, see Using node auto-provisioning.

To enable compact placement for node auto-provisioning, add a nodeSelector to the Pod specification with the following keys:

  • cloud.google.com/gke-placement-group is the identifier you assign for the group of Pods that should run together, in the same compact placement group.

  • cloud.google.com/machine-family is the name of the machine family name. Use one of the machine families that support compact placement. We recommend you use C2 or C2D machine families for workloads with compute and networking performance requirements.

The following example is a Pod specification that enables compact placement:

apiVersion: v1
kind: Pod
metadata:
  ...
spec:
  ...
  nodeSelector:
    cloud.google.com/gke-placement-group: PLACEMENT_GROUP_IDENTIFIER
    cloud.google.com/machine-family: MACHINE_FAMILY

You can omit the cloud.google.com/machine-family key if the Pod configuration already defines a machine type supported with compact placement. For example, if the Pod specification includes nvidia.com/gpu and the cluster is configured to use A100 GPUs, you don't need to include the cloud.google.com/machine-family key.

The following example is a Pod specification that defines nvidia.com/gpu request and the cluster is configured to use A100 GPUs. This Pod spec doesn't include the cloud.google.com/machine-family key:

  apiVersion: v1
  kind: Pod
  metadata:
    ...
  spec:
    ...
    nodeSelector:
      cloud.google.com/gke-placement-group: PLACEMENT_GROUP_IDENTIFIER
      cloud.google.com/gke-accelerator: "nvidia-tesla-a100"
    resources:
      limits:
        nvidia.com/gpu: 2

To learn more, see how to configure Pods to consume GPUs.

Optimize placement group size

Because GKE finds the best placement for smaller deployments, we recommend you instruct GKE to avoid running different type of Pods in the same placement group. Add a toleration key with the cloud.google.com/gke-placement-group key and the compact placement identifier you defined.

The following example is a Pod specification that defines a Pod toleration with compact placement:

apiVersion: v1
kind: Pod
metadata:
  ...
spec:
  ...
  tolerations:
  - key: cloud.google.com/gke-placement-group
    operator: "Equal"
    value: PLACEMENT_GROUP_IDENTIFIER
    effect: "NoSchedule"

For more information about node auto-provisioning with Pod toleration, see Workload separation

What's next