ComputeClass


ComputeClass is a Kubernetes Custom Resource Definition (CRD) that lets you define configurations and fallback priorities for GKE node scaling decisions. To learn more, see About custom compute classes.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: my-class
spec:
  activeMigration:
    optimizeRulePriority: false
  autoscalingPolicy:
    consolidationDelayMinutes: 1
    consolidationThreshold: 0
    gpuConsolidationThreshold: 0
  nodePoolAutoCreation:
    enabled: false
  priorities:
  - machineFamily: n2
    maxRunDurationSeconds: 360
    minCores: 16
    minMemoryGb: 64
    reservations:
      affinity: Specific
      specific:
      - name: n2-shared-reservation
        project: reservation-project
    spot: true
    storage:
      bootDiskKMSKey: projects/example/locations/us-central1/keyRings/example/cryptoKeys/key-1
      secondaryBootDisks:
      - diskImageName: pytorch-mnist
        project: k8s-staging-jobset
        mode: CONTAINER_IMAGE_CACHE
  - machineType: n2-standard-32
    spot: true
    reservations:
      affinity: AnyBestEffort
    storage:
      bootDiskSize: 100
      bootDiskType: pd-balanced
      localSSDCount: 1
  - nodepools: ['example-first-nodepool-name', 'example-second-nodepool-name'] 
  - gpu:
      count: 1
      driverVersion: default
      type: nvidia-l4
  - tpu:
      count: 8
      topology: "2x4"
      type: tpu-v5-lite-device
  whenUnsatisfiable: ScaleUpAnyway
status:
  conditions:
  - lastTransitionTime: 2024-10-10T00:00:00Z
    message: example-message
    observedGeneration: 1
    reason: example-reason
    status: "True"
    type: example-type

ComputeClass specification

metadata:
  name: string
spec:
  activeMigration: object(activeMigration)
  autoscalingPolicy: object(autoscalingPolicy)
  nodePoolAutoCreation: object(nodePoolAutoCreation)
  priorities: [
    object(priorities)
  ]
  whenUnsatisfiable: string
Fields

metadata

required

object

A field that identifies the compute class.

metadata.name

optional

string

The name of the compute class.

spec

required

object

The compute class specification, which defines how the compute class works.

spec.activeMigration

optional

object (activeMigration)

A specification that lets you choose whether GKE automatically replaces existing nodes that are lower in a compute class priority list with new nodes that are higher in that priority list.

spec.autoscalingPolicy

optional

object (autoscalingPolicy)

A specification that lets you fine-tune the timing and thresholds that cause GKE to remove underused nodes and consolidate workloads on other nodes.

spec.nodePoolAutoCreation

optional

object(nodePoolAutoCreation)

A specification that lets you choose whether GKE can create and delete node pools in Standard mode clusters based on the compute class priority rules. Requires node auto-provisioning to be enabled on the cluster.

spec.priorities[]

required

object (priorities)

A list of priority rules that defines how GKE configures nodes during scaling operations. When a cluster needs to scale up, GKE tries to create nodes that match the first priority rule in this field. If GKE can't create those nodes, it attempts the next rule. The process repeats until GKE successfully creates nodes or exhausts all of the rules in the list.

spec.whenUnsatisfiable

optional

string

A specification that lets you define what GKE does if none of the rules in the spec.priorities[] field can be met. Supported values are as follows:

  • ScaleUpAnyway: create a new node that uses the default cluster node configuration. In GKE versions earlier than 1.33, this is the default behavior.
  • DoNotScaleUp: leave the Pod in the Pending status until GKE can create a node that meets the criteria a priority rule. In GKE version 1.33 and later, this is the default behavior.

activeMigration

Choose whether GKE migrates workloads to higher priority nodes for the compute class as resources become available. For details, see Configure active migration to higher priority nodes.

activeMigration:
  optimizeRulePriority: boolean
Fields

optimizeRulePriority

required

boolean

Choose whether GKE migrates workloads to higher priority nodes when resources are available. If you omit this field, the default value is false.

autoscalingPolicy

autoscalingPolicy:
  consolidationDelayMinutes: integer
  consolidationThreshold: integer
  gpuConsolidationThreshold: integer
Fields

consolidationDelayMinutes

optional

integer

The number of minutes after which GKE removes underutilized nodes. The value must be between 1 and 1440.

consolidationThreshold

optional

integer

The CPU and memory utilization threshold as a percentage of the total resources on the node. A node becomes eligible for removal only when the resource utilization is less than this threshold. The value must be between 0 and 100.

gpuConsolidationThreshold

optional

integer

The GPU utilization threshold as a percentage of the total GPU resources on the node. A node becomes eligible for removal only when the resource utilization is less than this threshold. The value must be between 0 and 100.

Consider setting this value to either 0 or 100 so that GKE consolidates nodes that don't use 100% of the attached GPUs.

nodePoolAutoCreation

nodePoolAutoCreation:
  enabled: boolean
Fields

enabled

optional

boolean

Choose whether GKE can create and delete node pools in Standard mode clusters based on the compute class priority rules. Requires node auto-provisioning to be enabled on the cluster. If you omit this field, the default value is false.

priorities

- gpu: object(gpu)
  spot: boolean
- machineFamily: string
  maxRunDurationSeconds: integer
  minCores: integer
  minMemoryGb: integer
  reservations: object(reservations)
  spot: boolean
  storage: object(storage)
- machineType: string
  maxRunDurationSeconds: integer
  reservations: object(reservations)
  spot: boolean
  storage: object(storage)
- nodepools: []string
- tpu: object(tpu)
  reservations: object(reservations)
  spot: boolean
  storage: object(storage)
Fields

gpu

optional

object(gpu)

The GPU configuration.

machineFamily

optional

string

The Compute Engine machine series to use, such as n2 or c3. If you don't specify a value, GKE uses the default machine series of the cluster.

machineType

optional

string

The predefined Compute Engine machine type to use, such as n2-standard-32.

maxPodsPerNode

optional

integer

The maximum number of Pods that GKE can place on each node. The value must be between 8 and 256.

maxRunDurationSeconds

optional

integer

The maximum duration, in seconds, that the nodes can exist before being shut down. If you omit this field, the nodes can exist indefinitely.

minCores

optional

integer

The minimum number of vCPU cores that each node can have. If you omit this field, the default value is 0.

minMemoryGb

optional

integer

The minimum memory capacity, in GiB, that each node can have. If you omit this field, the default value is 0.

nodepools

optional

[]string

A list of existing manually created node pools in Standard mode clusters. You must associate these node pools with the compute class by using node labels and node taints. GKE doesn't process the node pools in this list in any order.

Example: nodepools: ['example-first-nodepool-name', 'example-second-nodepool-name']

reservations

optional

object (reservations)

The Compute Engine capacity reservations to consume during node provisioning.

spot

optional

boolean

The Spot VMs configuration. If you set this field to true, GKE uses Spot VMs to create your nodes. If you omit this field, the default value is false.

storage

optional

object (storage)

The boot disk configuration of each node.

tpu

optional

object (tpu)

The TPU configuration.

gpu

gpu:
  count: integer
  driverVersion: string
  type: string
Fields

count

required

integer

The number of GPUs to attach to each node. The value must be at least 1.

driverVersion

optional

string

Requires GKE version 1.31.1-gke.1858000 or later

The NVIDIA driver version to install. The supported values are as follows:

  • default: install the default driver version for the node GKE version. If you omit this field, this is the default value.
  • latest: install the latest driver version for the node GKE version.

gpu.type

required

string

The GPU type to attach to each node, such as nvidia-l4.

reservations

reservations:
  affinity: string
  specific: [
    object(specific)
  ]
Fields

affinity

required

string

The type of reservation to consume when creating nodes. The following values are supported:

  • Specific: consume only specific named reservations. If the specified reservation doesn't have any capacity, GKE moves on to the next priority rule in the compute class. If you use this value, the specific[] field is required.
  • AnyBestEffort: consume any reservation that matches the requirements of the priority rule. If any available reservation doesn't have capacity, GKE tries to provision an on-demand node with the priority rule configuration.
  • None: prevent GKE from consuming reservations when it creates nodes for that priority rule.

specific

optional*

object(specific)

The parameters for consuming specific reservations. If you set the affinity field to Specific, this field is required. If you set the affinity field to any other value, you can't specify the specific field.

specific

specific:
- name: string
  project: string
Fields

name

required

string

The name of the specific reservation to consume.

project

optional

The project ID of the Google Cloud project that contains the specific reservation. To use a shared reservation from a different project, this field is required.

storage

storage:
  bootDiskKMSKey: string
  bootDiskSize: integer
  bootDiskType: string
  localSSDCount: integer
  secondaryBootDisks: [
    object(secondarybootdisks)
  ]
Fields

bootDiskKMSKey

optional

string

The path to the Cloud KMS key to use to encrypt the boot disk.

bootDiskSize

optional

integer

The size, in GiB, of the boot disk for each node. The minimum value is 10.

bootDiskType

optional

string

The type of disk to attach to the node. The value that you specify must be supported by the machine series or the machine type in your priority rule. The following values are supported:

  • pd-balanced: balanced Persistent Disk.
  • pd-standard: standard Persistent Disk.
  • pd-ssd: performance (SSD) Persistent Disk.
  • hyperdisk-balanced: Hyperdisk Balanced.

For details about the disk types that specific machine series support, see the Machine series comparison table. Filter the table properties for "Hyperdisk" and "PD".

localSSDCount

optional

integer

The number of Local SSDs to attach to each node. If you specify this field, the minimum value is 1.

secondaryBootDisks[]

optional

[]object(secondaryBootDisks)

Requires GKE version 1.31.2-gke.1105000 or later

The configuration of secondary boot disks that are used to preload nodes with data, such as ML models or container images.

secondaryBootDisks

secondaryBootDisks:
- diskImageName: string
  mode: string
  project: string
Fields

diskImageName

required

string

The name of the disk image.

mode

optional

string

The mode in which the secondary boot disk should be used. The following values are supported:

  • CONTAINER_IMAGE_CACHE: use the disk as a container image cache.
  • MODE_UNSPECIFIED: don't use a specific mode. If you omit this field, this is the default value.

project

optional

string

The project ID of the Google Cloud project that the disk image belongs to. If you omit this field, the default value is the project ID of the cluster project.

tpu

- tpu:
    count: integer
    topology: string
    type: string
Fields

count

required

integer

The number of TPUs to attach to the node.

topology

required

string

The TPU topology to use, such as "2x2x1".

type

required

string

The TPU type to use, such as tpu-v6e-slice.

ComputeClass status

The status field is a list of status messages. This field is informational and is updated by the Kubernetes API server and the kubelet on each node.

status:
  conditions: [
    object(conditions)
  ]
Fields

conditions[]

object(conditions)

List of status conditions for the ComputeClass object.

conditions

conditions:
- type: string
  status: boolean
  reason: string
  message: string
  lastTransitionTime: string
  observedGeneration: integer
Fields

type

string

The type of condition, which helps to organize status messages.

status

string

The status of the condition. The value is one of the following:

  • True
  • False
  • Unknown

reason

string

A machine-readable reason why a specific condition type made its most recent transition.

message

string

A human-readable message that provides details about the most recent transition. This field might be empty.

lastTransitionTime

string

The timestamp of the most recent change to the condition.

observedGeneration

integer

A count of how many times the ComputeClass controller observed a change to the ComputeClass object. The controller attempts to reconcile the value in this field with the value in the metadata.generation field, which the Kubernetes API server updates whenever a change is made to the ComputeClass API object.