About built-in compute classes in Autopilot clusters

Autopilot

This page describes the built-in compute classes that are available in Google Kubernetes Engine (GKE) Autopilot clusters for workloads that have specific hardware requirements.

Overview of built-in compute classes in Autopilot clusters

By default, Pods in GKE Autopilot clusters run on a container-optimized compute platform. This platform is ideal for general-purpose workloads such as web servers and medium-intensity batch jobs. The container-optimized compute platform provides a reliable, scalable, cost-optimized hardware configuration that can handle the requirements of most workloads.

If you have workloads that have unique hardware requirements, such as performing machine learning or AI tasks, running real-time high traffic databases, or needing specific CPU platforms and architecture, Autopilot offers compute classes. These compute classes are a curated subset of the Compute Engine machine series, and offer flexibility beyond the default Autopilot compute platform. For example, the Scale-Out compute class uses VMs that turn off simultaneous multi-threading and are optimized for scaling out.

You can request nodes backed by specific compute classes based on the requirements of each of your workloads. Similar to the default general-purpose container-optimized compute platform, Autopilot manages the sizing and resource allocation of your requested compute classes based on your running Pods. You can request compute classes at the Pod-level to optimize cost-efficiency by choosing the best fit for each Pod's needs.

Custom compute classes for additional flexibility

If the built-in compute classes in Autopilot clusters don't meet your workload requirements, you can configure your own compute classes instead. You can deploy ComputeClass Kubernetes CustomResources to your clusters with sets of node attributes that GKE uses to configure new nodes in the cluster. For more information about custom compute classes, see About custom compute classes.

Choose a specific CPU architecture

If your workloads are designed for specific CPU platforms or architectures, you can optionally select those platforms or architectures in your Pod specifications. For example, if you want your Pods to run on nodes that use the Arm architecture, you can choose arm64 within the Scale-Out compute class.

Pricing

GKE Autopilot Pods are priced based on the nodes where the Pods are scheduled. For pricing information for general-purpose workloads and Spot Pods on specific compute classes, and for information on any committed use discounts, refer to Autopilot mode pricing.

Spot Pods on general-purpose or specialized compute classes don't qualify for committed use discounts.

When to use specific compute classes

The following table provides a technical overview of the predefined compute classes that Autopilot supports and example use cases for Pods running on each platform. If you don't request a compute class, Autopilot places your Pods on the general-purpose compute platform, which is designed to run most workloads optimally.

If none of these options meet your requirements, you can define and deploy your own custom compute classes that specify node properties for GKE to use when scaling up your cluster. For details, see About custom compute classes.

Workload requirement	Compute class	Description	Example use cases
Workloads that don't require specific hardware	General-purpose	Autopilot uses the general-purpose compute platform if you don't explicitly request a compute class in your Pod specification. You can't explicitly select the general-purpose platform in your specification. Backed by the E2 machine series. You might occasionally see nodes using the `ek` machine series. These machines specifically optimized for GKE Autopilot. EK machines are exclusively managed by GKE Autopilot and are not available as a selectable machine type on the Compute Engine platform. For this reason, you won't find detailed information about the 'EK' series through the Compute Engine API.	Web servers Small to medium databases Development environments
Workloads that require GPUs	`Accelerator`	Pods can access compute resources at any time No Pod memory or CPU limits Compatible GPU types are the following: `nvidia-b200`: NVIDIA B200 (180GB) `nvidia-h200-141gb`: NVIDIA H200 (141GB) `nvidia-h100-mega-80gb`: NVIDIA H100 Mega (80GB) `nvidia-h100-80gb`: NVIDIA H100 (80GB) `nvidia-a100-80gb`: NVIDIA A100 (80GB) `nvidia-tesla-a100`: NVIDIA A100 (40GB) `nvidia-l4`: NVIDIA L4 `nvidia-tesla-t4`: NVIDIA T4	GPU-centric AI/ML training and inference
CPU or memory requests larger than the general-purpose compute class maximums or specific CPU platforms	`Balanced`	Available CPUs: AMD EPYC Rome, AMD EPYC Milan, Intel Ice Lake, Intel Cascade Lake Available architecture: amd64 Larger supported resource requests than general-purpose Ability to set minimum CPU platforms for Pods, such as "Intel Ice Lake or higher". Backed by the N2 machine series (Intel) or the N2D machine series (AMD).	Web servers Medium to large databases Caching Streaming and media serving Hyperdisk Throughput and Extreme storage
Workloads with requirements for specific machine series that aren't covered by other compute classes	Specific machine series	Available architecture: amd64, arm64 Pods can request specific machine series for their nodes No Pod memory or CPU limits For details, see Optimize Autopilot Pod performance by choosing a machine series.	CPU-centric AI/ML training and inference HPC batch workloads Hyperdisk Balanced, Throughput, and Extreme storage
CPU-intensive workloads like AI/ML training or high performance computing (HPC)	`Performance`	Available architecture: amd64, arm64 Pods can access compute resources at any time No Pod memory or CPU limits One Pod per node For a list of Compute Engine machine series available with the Performance compute class, see Supported machine series.	CPU-centric AI/ML training and inference HPC batch workloads Hyperdisk Balanced, Throughput, and Extreme storage
Single thread-per-core computing and horizontal scaling	`Scale-Out`	Available CPUs: Ampere Altra Arm or AMD EPYC Milan Compute Engine machine family: T2A (Arm), T2D (x86) Available architecture: arm64 or amd64 SMT off. One vCPU is equal to one physical core. Max 3.5GHz clock Backed by the Tau T2A machine series (Arm) or the Tau T2D machine series (x86).	Web servers Containerized microservices Data log processing Large-scale Java apps Hyperdisk Throughput storage

How to select a compute class in Autopilot

For detailed instructions, refer to Choose compute classes for Autopilot Pods.

To tell Autopilot to place your Pods on a specific compute class, specify the cloud.google.com/compute-class label in a nodeSelector or a node affinity rule, such as in the following examples:

nodeSelector

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: hello-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: hello-app
      template:
        metadata:
          labels:
            app: hello-app
        spec:
          nodeSelector:
            cloud.google.com/compute-class: "COMPUTE_CLASS"
          containers:
          - name: hello-app
            image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
            resources:
              requests:
                cpu: "2000m"
                memory: "2Gi"

Replace COMPUTE_CLASS with the name of the compute class based on your use case, such as Scale-Out. If you select Accelerator, you must also specify a compatible GPU. For instructions, see Deploy GPU workloads in Autopilot. If you select Performance, you can optionally select a Compute Engine machine series in the node selector. If you don't specify a machine series, GKE uses the C4 machine series depending on regional availability. For instructions, see Run CPU-intensive workloads with optimal performance.

nodeAffinity

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: hello-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: hello-app
      template:
        metadata:
          labels:
            app: hello-app
        spec:
          terminationGracePeriodSeconds: 25
          containers:
          - name: hello-app
            image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
            resources:
              requests:
                cpu: "2000m"
                memory: "2Gi"
                ephemeral-storage: "1Gi"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/compute-class
                    operator: In
                    values:
                    - "COMPUTE_CLASS"

When you deploy the workload, Autopilot does the following:

Automatically provisions nodes backed by the specified configuration to run your Pods.
Automatically adds taints to the new nodes to prevent other Pods from scheduling on those nodes. The taints are unique to each compute class. If you also select a CPU architecture, GKE adds a separate taint unique to that architecture.
Automatically adds tolerations corresponding to the applied taints to your deployed Pods, which lets GKE place those Pods on the new nodes.

For example, if you request the Scale-Out compute class for a Pod:

Autopilot adds a taint specific to Scale-Out for those nodes.
Autopilot adds a toleration for that taint to the Scale-Out Pods.

Pods that don't request Scale-Out won't get the toleration. As a result, GKE won't schedule those Pods on the Scale-Out nodes.

If you don't explicitly request a compute class in your workload specification, Autopilot schedules Pods on nodes that use the default general-purpose compute class. Most workloads can run with no issues on the general-purpose compute class.

How to request a CPU architecture

In some cases, your workloads might be built for a specific architecture, such as Arm. Some compute classes, such as Balanced or Scale-Out, support multiple CPU architectures. You can request a specific architecture alongside your compute class request by specifying a label in your node selector or node affinity rule, such as in the following example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-arm
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-arm
  template:
    metadata:
      labels:
        app: nginx-arm
    spec:
      nodeSelector:
        cloud.google.com/compute-class: COMPUTE_CLASS
        kubernetes.io/arch: ARCHITECTURE
      containers:
      - name: nginx-arm
        image: nginx
        resources:
          requests:
            cpu: 2000m
            memory: 2Gi

Replace ARCHITECTURE with the CPU architecture that you want, such as arm64 or amd64.

If you don't explicitly request an architecture, Autopilot uses the default architecture of the specified compute class.

Arm architecture on Autopilot

Autopilot supports requests for nodes that use the Arm CPU architecture. Arm nodes are more cost-efficient than similar x86 nodes while delivering performance improvements. For instructions to request Arm nodes, refer to Deploy Autopilot workloads on Arm architecture.

Ensure that you're using the correct images in your deployments. If your Pods use Arm images and you don't request Arm nodes, Autopilot schedules the Pods on x86 nodes and the Pods will crash. Similarly, if you accidentally use x86 images but request Arm nodes for the Pods, the Pods will crash.

Autopilot validations for compute class workloads

Autopilot validates your workload manifests to ensure that the compute class and architecture requests in your node selector or node affinity rules are correctly formatted. The following rules apply:

No more than one compute class.
No unsupported compute classes.
The GKE version must support the compute class.
No more than one selected architecture.
The compute class must support the selected architecture.

If your workload manifest fails any of these validations, Autopilot rejects the workload.

Compute class regional availability

The following table describes the regions in which specific compute classes and CPU architectures are available:

Compute class availability
General-purpose	All regions
`Balanced`	All regions
`Performance`	All regions that contain a supported machine series.
`Scale-Out`	All regions that contain a corresponding Compute Engine machine series. To view specific machine series availability, use the filters in Available regions and zones.

If a compute class is available in a specific region, the hardware is available in at least two zones in that region.

Default, minimum, and maximum resource requests

When choosing a compute class for your Autopilot workloads, make sure that you specify resource requests that meet the minimum and maximum requests for that compute class. For information about the default requests, as well as the minimum and maximum requests for each compute class, refer to Resource requests and limits in GKE Autopilot.