Consuming reserved zonal resources


This page shows you how to consume reserved Compute Engine zonal resources in specific GKE workloads. These capacity reservations give you a high level of assurance that specific hardware is available for your workloads.

Ensure that you're already familiar with the concepts of Compute Engine reservations, like consumption types, share types, and provisioning types. For details, see Reservations of Compute Engine zonal resources.

This page is intended for the following people:

  • Application operators who deploy workloads that should run as soon as possible, usually with specialized hardware like GPUs.
  • Platform administrators who want to obtain a high level of assurance that workloads run on optimized hardware that meets both application and organizational requirements.

About reservation consumption in GKE

Compute Engine capacity reservations let you provision specific hardware configurations in Google Cloud zones, either immediately or at a specified future time. You can then consume this reserved capacity in GKE.

Depending on your GKE mode of operation, you can consume the following reservation types:

  • Autopilot mode: specific reservations only.
  • Standard mode: specific reservations or any matching reservation.

To enable consuming reservations to create your resources, you must specify a reservation affinity, like any or specific.

Reservation consumption options in GKE

GKE lets you consume reservations directly in individual workloads by using Kubernetes nodeSelectors in your workload manifest or by creating Standard mode node pools that consume the reservation. This page describes the approach of directly selecting reservations in individual resources.

You can also configure GKE to consume reservations during scaling operations that create new nodes by using custom compute classes. Custom compute classes let platform administrators define a hierarchy of node configurations for GKE to prioritize during node scaling so that workloads run on your selected hardware.

You can specify reservations in your custom compute class configuration so that any GKE workload that uses that custom compute class indicates to GKE to consume the specified reservations for that compute class.

To learn more, in the "About custom compute classes" page, see Consume Compute Engine reservations.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Consume capacity reservations in Autopilot clusters

Autopilot clusters support consuming resources from Compute Engine capacity reservations in the same project or in a shared project. You must set the consumption type property of the target reservation to specific, and you must explicitly select that reservation in your manifest. If you don't explicitly specify a reservation, Autopilot clusters won't consume reservations. To learn more about reservation consumption types, see How reservations work.

These reservations qualify for Compute flexible committed use discounts. You must use the Accelerator compute class or the Performance compute class to consume capacity reservations.

  • Before you begin, create an Autopilot cluster running the following versions:

    • To consume reserved accelerators, such as GPUs: 1.28.6-gke.1095000 or later
    • To run Pods on a specific machine series and with each Pod on its own node: 1.28.6-gke.1369000 and later or version 1.29.1-gke.1575000 and later.

Create capacity reservations for Autopilot

Autopilot Pods can consume reservations that have the specific consumption type property in the same project as the cluster or in a shared reservation from a different project. You can consume the reserved hardware by explicitly referencing that reservation in your manifest. You can consume reservations in Autopilot for the following types of hardware:

  • Any of the following types of GPUs:
    • nvidia-h100-mega-80gb: NVIDIA H100 Mega (80GB)
    • nvidia-h100-80gb: NVIDIA H100 (80GB)
    • nvidia-a100-80gb: NVIDIA A100 (80GB)
    • nvidia-tesla-a100: NVIDIA A100 (40GB)
    • nvidia-l4: NVIDIA L4
    • nvidia-tesla-t4: NVIDIA T4

To create a capacity reservation, see the following resources. The reservation must meet the following requirements:

  • The machine types, accelerator types, and accelerator quantities match what your workloads will consume.
  • The reservation uses the specific consumption type. For example, in the gcloud CLI, you must specify the --require-specific-reservation flag when you create the reservation.

  • Create a reservation for a single project

  • Create a shared reservation

Consume a specific reservation in the same project in Autopilot

This section shows you how to consume a specific capacity reservation that's in the same project as your cluster. You can use kubectl or Terraform.

kubectl

  1. Save the following manifest as specific-autopilot.yaml. This manifest has node selectors that consume a specific reservation. You can use VM instances or accelerators.

    VM instances

      apiVersion: v1
      kind: Pod
      metadata:
        name: specific-same-project-pod
      spec:
        nodeSelector:
          cloud.google.com/compute-class: Performance
          cloud.google.com/machine-family: MACHINE_SERIES
          cloud.google.com/reservation-name: RESERVATION_NAME
          cloud.google.com/reservation-affinity: "specific"
        containers:
        - name: my-container
          image: "k8s.gcr.io/pause"
          resources:
            requests:
              cpu: 12
              memory: "50Gi"
              ephemeral-storage: "200Gi"
    

    Replace the following:

    • MACHINE_SERIES: a machine series that contains the machine type of the VMs in your specific capacity reservation. For example, if your reservation is for c3-standard-4 machine types, specify C3 in the MACHINE_SERIES field.
    • RESERVATION_NAME: the name of the Compute Engine capacity reservation.

    Accelerators

      apiVersion: v1
      kind: Pod
      metadata:
        name: specific-same-project-pod
      spec:
        nodeSelector:
          cloud.google.com/gke-accelerator: ACCELERATOR
          cloud.google.com/reservation-name: RESERVATION_NAME
          cloud.google.com/reservation-affinity: "specific"
        containers:
        - name: my-container
          image: "k8s.gcr.io/pause"
          resources:
            requests:
              cpu: 12
              memory: "50Gi"
              ephemeral-storage: "200Gi"
            limits:
              nvidia.com/gpu: QUANTITY
    

    Replace the following:

    • ACCELERATOR: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:
      • nvidia-h100-mega-80gb: NVIDIA H100 Mega (80GB)
      • nvidia-h100-80gb: NVIDIA H100 (80GB)
      • nvidia-a100-80gb: NVIDIA A100 (80GB)
      • nvidia-tesla-a100: NVIDIA A100 (40GB)
      • nvidia-l4: NVIDIA L4
      • nvidia-tesla-t4: NVIDIA T4
    • RESERVATION_NAME: the name of the Compute Engine capacity reservation.
    • QUANTITY: the number of GPUs to attach to the container. Must be a supported quantity for the specified GPU, as described in Supported GPU quantities.
  2. Deploy the Pod:

    kubectl apply -f specific-autopilot.yaml
    

Autopilot uses the reserved capacity in the specified reservation to provision a new node to place the Pod.

Terraform

To consume a specific reservation in the same project with VM instances using Terraform, refer to the following example:

resource "kubernetes_pod_v1" "default_pod" {
  metadata {
    name = "specific-same-project-pod"
  }

  spec {
    node_selector = {
      "cloud.google.com/compute-class"        = "Performance"
      "cloud.google.com/machine-family"       = "c3"
      "cloud.google.com/reservation-name"     = google_compute_reservation.specific_pod.name
      "cloud.google.com/reservation-affinity" = "specific"
    }

    container {
      name  = "my-container"
      image = "registry.k8s.io/pause"

      resources {
        requests = {
          cpu               = 2
          memory            = "8Gi"
          ephemeral-storage = "1Gi"
        }
      }

      security_context {
        allow_privilege_escalation = false
        run_as_non_root            = false

        capabilities {
          add  = []
          drop = ["NET_RAW"]
        }
      }
    }

    security_context {
      run_as_non_root     = false
      supplemental_groups = []

      seccomp_profile {
        type = "RuntimeDefault"
      }
    }
  }

  depends_on = [
    google_compute_reservation.specific_pod
  ]
}

To consume a specific reservation in the same project with the Accelerator compute class using Terraform, refer to the following example:

resource "kubernetes_pod_v1" "default_accelerator" {
  metadata {
    name = "specific-same-project-accelerator"
  }

  spec {
    node_selector = {
      "cloud.google.com/compute-class"        = "Accelerator"
      "cloud.google.com/gke-accelerator"      = "nvidia-l4"
      "cloud.google.com/reservation-name"     = google_compute_reservation.specific_accelerator.name
      "cloud.google.com/reservation-affinity" = "specific"
    }

    container {
      name  = "my-container"
      image = "registry.k8s.io/pause"

      resources {
        requests = {
          cpu               = 2
          memory            = "7Gi"
          ephemeral-storage = "1Gi"
          "nvidia.com/gpu"  = 1

        }
        limits = {
          "nvidia.com/gpu" = 1
        }
      }

      security_context {
        allow_privilege_escalation = false
        run_as_non_root            = false

        capabilities {
          add  = []
          drop = ["NET_RAW"]
        }
      }
    }

    security_context {
      run_as_non_root     = false
      supplemental_groups = []

      seccomp_profile {
        type = "RuntimeDefault"
      }
    }
  }

  depends_on = [
    google_compute_reservation.specific_accelerator
  ]
}

To learn more about using Terraform, see Terraform support for GKE.

Consume a specific shared reservation in Autopilot

This section uses the following terms:

  • Owner project: the project that owns the reservation and shares it with other projects.
  • Consumer project: the project that runs the workloads that consume the shared reservation.

To consume a shared reservation, you must grant the GKE service agent access to the reservation in the project that owns the reservation. Do the following:

  1. Create a custom IAM role that contains the compute.reservations.list permission in the owner project:

    gcloud iam roles create ROLE_NAME \
        --project=OWNER_PROJECT_ID \
        --permissions='compute.reservations.list'
    

    Replace the following:

    • ROLE_NAME: a name for your new role.
    • OWNER_PROJECT_ID: the project ID of the project that owns the capacity reservation.
  2. Give the GKE service agent in the consumer project access to list shared reservations in the owner project:

    gcloud projects add-iam-policy-binding OWNER_PROJECT_ID \
        --project=OWNER_PROJECT_ID \
        --member=serviceAccount:service-CONSUMER_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com \
        --role='projects/OWNER_PROJECT_ID/roles/ROLE_NAME'
    

    Replace CONSUMER_PROJECT_NUMBER with the numerical project number of your consumer project. To find this number, see Identifying projects in the Resource Manager documentation.

  3. Save the following manifest as shared-autopilot.yaml. This manifest has nodeSelectors that tell GKE to consume a specific shared reservation.

    VM instances

    apiVersion: v1
    kind: Pod
    metadata:
      name: performance-pod
    spec:
      nodeSelector:
        cloud.google.com/compute-class: Performance
        cloud.google.com/machine-family: MACHINE_SERIES
        cloud.google.com/reservation-name: RESERVATION_NAME
        cloud.google.com/reservation-project: OWNER_PROJECT_ID
        cloud.google.com/reservation-affinity: "specific"
      containers:
      - name: my-container
        image: "k8s.gcr.io/pause"
        resources:
          requests:
            cpu: 12
            memory: "50Gi"
            ephemeral-storage: "200Gi"
    

    Replace the following:

    • MACHINE_SERIES: a machine series that contains the machine type of the VMs in your specific capacity reservation. For example, if your reservation is for c3-standard-4 machine types, specify C3 in the MACHINE_SERIES field.
    • RESERVATION_NAME: the name of the Compute Engine capacity reservation.
    • OWNER_PROJECT_ID: the project ID of the project that owns the capacity reservation.

    Accelerators

    apiVersion: v1
    kind: Pod
    metadata:
      name: specific-same-project-pod
    spec:
      nodeSelector:
        cloud.google.com/gke-accelerator: ACCELERATOR
        cloud.google.com/reservation-name: RESERVATION_NAME
        cloud.google.com/reservation-project: OWNER_PROJECT_ID
        cloud.google.com/reservation-affinity: "specific"
      containers:
      - name: my-container
        image: "k8s.gcr.io/pause"
        resources:
          requests:
            cpu: 12
            memory: "50Gi"
            ephemeral-storage: "200Gi"
          limits:
            nvidia.com/gpu: QUANTITY
    

    Replace the following:

    • ACCELERATOR: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:
      • nvidia-h100-mega-80gb: NVIDIA H100 Mega (80GB)
      • nvidia-h100-80gb: NVIDIA H100 (80GB)
      • nvidia-a100-80gb: NVIDIA A100 (80GB)
      • nvidia-tesla-a100: NVIDIA A100 (40GB)
      • nvidia-l4: NVIDIA L4
      • nvidia-tesla-t4: NVIDIA T4
    • RESERVATION_NAME: the name of the Compute Engine capacity reservation.
    • OWNER_PROJECT_ID: the project ID of the project that owns the capacity reservation.
    • QUANTITY: the number of GPUs to attach to the container. Must be a supported quantity for the specified GPU, as described in Supported GPU quantities.
  4. Deploy the Pod:

    kubectl apply -f shared-autopilot.yaml
    

Autopilot uses the reserved capacity in the specified reservation to provision a new node to place the Pod.

Troubleshooting consuming reservations in Autopilot

  • Ensure that the machine types, accelerator types, local SSD configurations, and accelerator quantities match what your workloads will consume. For a complete list of properties which must match, see Compute Engine capacity reservation properties.
  • Ensure that the reservation is created with specific affinity.
  • When using shared reservations, ensure that GKE service agent in the consumer project has permission to list shared reservations in the owner project.

Consuming reserved instances in GKE Standard

When you create a cluster or node pool, you can indicate the reservation consumption mode by specifying the --reservation-affinity flag.

Consuming any matching reservations

You can create a reservation and instances to consume any reservation using the gcloud CLI or Terraform.

gcloud

To consume from any matching reservations automatically, set the reservation affinity flag to --reservation-affinity=any. Since any is the default value defined in Compute Engine, you can omit the reservation affinity flag entirely.

In the any reservation consumption mode, nodes first take capacity from all single-project reservations before any shared reservations, because the shared reservations are more available to other projects. For more information about how instances are automatically consumed see Consumption order.

  1. Create a reservation of three VM instances:

    gcloud compute reservations create RESERVATION_NAME \
        --machine-type=MACHINE_TYPE --vm-count=3
    

    Replace the following:

    • RESERVATION_NAME: the name of the reservation to create.
    • MACHINE_TYPE: the type of machine (name only) to use for the reservation. For example, n1-standard-2.
  2. Verify the reservation was created successfully:

    gcloud compute reservations describe RESERVATION_NAME
    
  3. Create a cluster having one node to consume any matching reservation:

    gcloud container clusters create CLUSTER_NAME \
        --machine-type=MACHINE_TYPE --num-nodes=1 \
        --reservation-affinity=any
    

    Replace CLUSTER_NAME with the name of the cluster to create.

  4. Create a node pool with three nodes to consume any matching reservation:

    gcloud container node-pools create NODEPOOL_NAME \
        --cluster CLUSTER_NAME --num-nodes=3 \
        --machine-type=MACHINE_TYPE --reservation-affinity=any
    

    Replace NODEPOOL_NAME with the name of the node pool to create.

The total number of nodes is four, which exceeds the capacity of the reservation. Three of the nodes consume the reservation while the last node takes capacity from the general Compute Engine resource pool.

Terraform

To create a reservation of three VM instances using Terraform, refer to the following example:

resource "google_compute_reservation" "any_reservation" {
  name = "any-reservation"
  zone = "us-central1-a"

  specific_reservation {
    count = 3

    instance_properties {
      machine_type = "e2-medium"
    }
  }
}

To create a cluster having one node to consume any matching reservation using Terraform, refer to the following example:

resource "google_container_cluster" "default" {
  name     = "gke-standard-zonal-cluster"
  location = "us-central1-a"

  initial_node_count = 1

  node_config {
    machine_type = "e2-medium"

    reservation_affinity {
      consume_reservation_type = "ANY_RESERVATION"
    }
  }

  depends_on = [
    google_compute_reservation.any_reservation
  ]

  # Set `deletion_protection` to `true` will ensure that one cannot
  # accidentally delete this instance by use of Terraform.
  deletion_protection = false
}

To create a node pool with three nodes to consume any matching reservation using Terraform, refer to the following example:

resource "google_container_node_pool" "any_node_pool" {
  name     = "gke-standard-zonal-any-node-pool"
  cluster  = google_container_cluster.default.name
  location = google_container_cluster.default.location

  initial_node_count = 3
  node_config {
    machine_type = "e2-medium"

    reservation_affinity {
      consume_reservation_type = "ANY_RESERVATION"
    }
  }
}

To learn more about using Terraform, see Terraform support for GKE.

Consuming a specific single-project reservation

To consume a specific reservation, set the reservation affinity flag to --reservation-affinity=specific and provide the specific reservation name. In this mode, instances must take capacity from the specified reservation in the zone. The request fails if the reservation does not have sufficient capacity.

To create a reservation and instances to consume a specific reservation, perform the following steps. You can use the gcloud CLI or Terraform.

gcloud

  1. Create a specific reservation for three VM instances:

    gcloud compute reservations create RESERVATION_NAME \
        --machine-type=MACHINE_TYPE --vm-count=3 \
        --require-specific-reservation
    

    Replace the following:

    • RESERVATION_NAME: the name of the reservation to create.
    • MACHINE_TYPE: the type of machine (name only) to use for the reservation. For example, n1-standard-2.
  2. Create a node pool with a single node to consume a specific single-project reservation:

    gcloud container node-pools create NODEPOOL_NAME \
        --cluster CLUSTER_NAME \
        --machine-type=MACHINE_TYPE --num-nodes=1 \
        --reservation-affinity=specific --reservation=RESERVATION_NAME
    

    Replace the following:

    • NODEPOOL_NAME: the name of the node pool to create.
    • CLUSTER_NAME: the name of the cluster that you created.

Terraform

To create a specific reservation using Terraform, refer to the following example:

resource "google_compute_reservation" "specific_reservation" {
  name = "specific-reservation"
  zone = "us-central1-a"

  specific_reservation {
    count = 1

    instance_properties {
      machine_type = "e2-medium"
    }
  }

  specific_reservation_required = true
}

To create a node pool with a single node to consume a specific single-project reservation using Terraform, refer to the following example:

resource "google_container_node_pool" "specific_node_pool" {
  name     = "gke-standard-zonal-specific-node-pool"
  cluster  = google_container_cluster.default.name
  location = google_container_cluster.default.location

  initial_node_count = 1
  node_config {
    machine_type = "e2-medium"

    reservation_affinity {
      consume_reservation_type = "SPECIFIC_RESERVATION"
      key                      = "compute.googleapis.com/reservation-name"
      values                   = [google_compute_reservation.specific_reservation.name]
    }
  }

  depends_on = [
    google_compute_reservation.specific_reservation
  ]
}

To learn more about using Terraform, see Terraform support for GKE.

Consuming a specific shared reservation

To create a specific shared reservation and consume the shared reservation, perform the following steps. You can use the gcloud CLI or Terraform.

  1. Follow the steps in Allowing and restricting projects from creating and modifying shared reservations.

gcloud

  1. Create a specific shared reservation:

    gcloud compute reservations create RESERVATION_NAME \
        --machine-type=MACHINE_TYPE --vm-count=3 \
        --zone=ZONE \
        --require-specific-reservation \
        --project=OWNER_PROJECT_ID \
        --share-setting=projects \
        --share-with=CONSUMER_PROJECT_IDS
    

    Replace the following:

    • RESERVATION_NAME: the name of reservation to create.
    • MACHINE_TYPE: the name of the type of machine to use for the reservation. For example, n1-standard-2.
    • OWNER_PROJECT_ID: the project ID of the project that you want to create this shared reservation. If you omit the --project flag, GKE uses the current project as the owner project by default.
    • CONSUMER_PROJECT_IDS: a comma-separated list of the project IDs of projects that you want to share this reservation with. For example, project-1,project-2. You can include 1 to 100 consumer projects. These projects must be in the same organization as the owner project. Don't include the OWNER_PROJECT_ID, because it can consume this reservation by default.
  2. Consume the shared reservation:

      gcloud container node-pools create NODEPOOL_NAME \
          --cluster CLUSTER_NAME \
          --machine-type=MACHINE_TYPE --num-nodes=1 \
          --reservation-affinity=specific \
          --reservation=projects/OWNER_PROJECT_ID/reservations/RESERVATION_NAME
    

    Replace the following:

    • NODEPOOL_NAME: the name of the node pool to create.
    • CLUSTER_NAME: the name of the cluster that you created.

Terraform

To create a specific shared reservation using Terraform, refer to the following example:

resource "google_compute_reservation" "specific_reservation" {
  name = "specific-reservation"
  zone = "us-central1-a"

  specific_reservation {
    count = 1

    instance_properties {
      machine_type = "e2-medium"
    }
  }

  specific_reservation_required = true
}

To consume the specific shared reservation using Terraform, refer to the following example:

resource "google_container_node_pool" "specific_node_pool" {
  name     = "gke-standard-zonal-specific-node-pool"
  cluster  = google_container_cluster.default.name
  location = google_container_cluster.default.location

  initial_node_count = 1
  node_config {
    machine_type = "e2-medium"

    reservation_affinity {
      consume_reservation_type = "SPECIFIC_RESERVATION"
      key                      = "compute.googleapis.com/reservation-name"
      values                   = [google_compute_reservation.specific_reservation.name]
    }
  }

  depends_on = [
    google_compute_reservation.specific_reservation
  ]
}

To learn more about using Terraform, see Terraform support for GKE.

Additional considerations for consuming from a specific reservation

When a node pool is created with specific reservation affinity, including default node pools during cluster creation, its size is limited to the capacity of the specific reservation over the node pool's entire lifetime. This affects the following GKE features:

  • Cluster with multiple zones: In regional or multi-zonal clusters, nodes of a node pool can span across multiple zones. Since reservations are single-zonal, multiple reservations are needed. To create a node pool consuming specific reservation in these clusters, you must create a specific reservation with exactly the same name and machine properties in each zone of the node pool.
  • Cluster autoscaling and node pool upgrades: If you don't have extra capacity in the specific reservation, node pool upgrades or autoscaling of the node pool might fail because both operations require creating extra instances. To resolve this, you can change the size of the reservation, or free up some of its bounded resources.

Creating nodes without consuming reservations

To explicitly avoid consuming resources from any reservations, set the affinity to --reservation-affinity=none.

  1. Create a cluster that won't consume any reservation:

    gcloud container clusters create CLUSTER_NAME --reservation-affinity=none
    

    Replace CLUSTER_NAME with the name of the cluster to create.

  2. Create a node pool that won't consume any reservation:

    gcloud container node-pools create NODEPOOL_NAME \
        --cluster CLUSTER_NAME \
        --reservation-affinity=none
    

    Replace NODEPOOL_NAME with the name of the node pool to create.

Following available reservations between zones

When using node pools running in multiple zones with reservations that are not equal between zones, you can use the flag --location_policy=ANY. This ensures that when new nodes are added to the cluster they are created in the zone that still has unused reservations.

TPU reservation

TPU reservations differ from other machine types. The following are TPU-specific aspects you should consider when creating TPU reservations:

  • When using TPUs in GKE, SPECIFIC is the only supported value for the --reservation-affinity flag of gcloud container node-pools create.

For more information, see TPU reservations.

Cleaning up

To avoid incurring charges to your Cloud Billing account for the resources used in this page:

  1. Delete the clusters you created by running the following command for each of the clusters:

    gcloud container clusters delete CLUSTER_NAME
    
  2. Delete the reservations you created by running the following command for each of the reservations:

    gcloud compute reservations delete RESERVATION_NAME
    

What's next