About Balanced and Scale-Out ComputeClasses in Autopilot clusters

Autopilot

You can use the Balanced and Scale-Out ComputeClasses in Google Kubernetes Engine (GKE) Autopilot clusters to run workloads that require extra compute capacity or specialized CPU configurations. This page is intended for cluster administrators who want more flexible compute options than the default Autopilot cluster configuration provides.

Overview of Balanced and Scale-Out ComputeClasses

By default, Pods in GKE Autopilot clusters run on a container-optimized compute platform. This platform is ideal for general-purpose workloads such as web servers and medium-intensity batch jobs. The container-optimized compute platform provides a reliable, scalable, cost-optimized hardware configuration that can handle the requirements of most workloads.

If you have workloads that have unique hardware requirements (such as performing machine learning or AI tasks, running real-time high traffic databases, or needing specific CPU platforms and architecture) you can use ComputeClasses to provision that hardware.

In Autopilot clusters only, GKE provides the following curated ComputeClasses that let you run Pods that need more flexibility than the default container-optimized compute platform:

Balanced: provides higher maximum CPU and memory capacity than the container-optimized compute platform.
Scale-Out: disables simultaneous multi-threading (SMT) and is optimized for scaling out.

These ComputeClasses are available in only Autopilot clusters. Similar to the default container-optimized compute platform, Autopilot manages node sizing and resource allocation based on your running Pods.

Custom ComputeClasses for additional flexibility

If the Balanced or Scale-Out ComputeClasses in Autopilot clusters don't meet your workload requirements, you can configure your own ComputeClasses. You deploy ComputeClass Kubernetes custom resources to your clusters with sets of node attributes that GKE uses to configure new nodes in the cluster. These custom ComputeClasses can, for example, let you deploy workloads on the same hardware as the Balanced or Scale-Out ComputeClasses in any GKE Autopilot or Standard cluster. For more information, see About Autopilot mode workloads in GKE Standard.

Pricing

Pods that use the Balanced or Scale-Out ComputeClasses are billed based on the following SKUs:

For more information, see GKE pricing.

Balanced and Scale-Out technical details

This section describes the machine types and use cases for the Balanced and Scale-Out classes. If you don't request a ComputeClass in your Pods, Autopilot places the Pods on the container-optimized compute platform by default. You might sometimes see ek as the node machine series in your Autopilot nodes that use the container-optimized compute platform. EK machines are E2 machine types that are exclusive to Autopilot.

The following table provides a technical overview of the Balanced and Scale-Out ComputeClasses.

Balanced and Scale-Out ComputeClasses

Balanced and Scale-Out ComputeClasses
`Balanced`	Provides more CPU capacity and memory capacity than the container-optimized compute platform maximums. Provides additional CPU platforms and the ability to set minimum CPU platforms for Pods, such as Intel Ice Lake or later. Available CPUs: AMD EPYC Rome, AMD EPYC Milan, Intel Ice Lake, Intel Cascade Lake Available architecture: amd64 Machine series: N2 (Intel CPUs) or N2D machine series (AMD CPUs). Use the `Balanced` class for applications such as the following: Web servers Medium to large databases Caching Streaming and media serving Hyperdisk Throughput and Extreme storage
`Scale-Out`	Provides single-thread-per-core computing and horizontal scaling. Available CPUs: Ampere Altra Arm or AMD EPYC Milan Available architecture: arm64 or amd64 Machine series: T2A (Arm) or T2D (x86). Additional features: SMT is disabled, so one vCPU is equal to one physical core. 3.5GHz maximum clock speed. Use the `Scale-Out` class for applications such as the following: Web servers Containerized microservices Data log processing Large-scale Java apps Hyperdisk Throughput storage

Balanced

Provides more CPU capacity and memory capacity than the container-optimized compute platform maximums. Provides additional CPU platforms and the ability to set minimum CPU platforms for Pods, such as Intel Ice Lake or later.

Available CPUs: AMD EPYC Rome, AMD EPYC Milan, Intel Ice Lake, Intel Cascade Lake
Available architecture: amd64
Machine series: N2 (Intel CPUs) or N2D machine series (AMD CPUs).

Use the Balanced class for applications such as the following:

Web servers
Medium to large databases
Caching
Streaming and media serving
Hyperdisk Throughput and Extreme storage

Scale-Out

Provides single-thread-per-core computing and horizontal scaling.

Available CPUs: Ampere Altra Arm or AMD EPYC Milan
Available architecture: arm64 or amd64
Machine series: T2A (Arm) or T2D (x86).
Additional features:
- SMT is disabled, so one vCPU is equal to one physical core.
- 3.5GHz maximum clock speed.

Use the Scale-Out class for applications such as the following:

Web servers
Containerized microservices
Data log processing
Large-scale Java apps
Hyperdisk Throughput storage

ComputeClass selection in workloads

To use a ComputeClass for a GKE workload, you select the ComputeClass in the workload manifest by using a node selector for the cloud.google.com/compute-class label.

The following example Deployment manifest selects a ComputeClass:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloweb
  labels:
    app: hello
spec:
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      nodeSelector:
        # Replace with the name of a compute class
        cloud.google.com/compute-class: COMPUTE_CLASS 
      containers:
      - name: hello-app
        image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "1Gi"

Replace COMPUTE_CLASS with the name of a ComputeClass, such as Balanced or Scale-Out. You can select a maximum of one ComputeClass in a workload.

When you deploy the workload, GKE does the following:

Automatically provisions nodes backed by the specified configuration to run your Pods.
Automatically adds node labels and taints to the new nodes to prevent other Pods from scheduling on those nodes. The taints are unique to each ComputeClass. If you also select a CPU architecture, GKE adds a separate taint unique to that architecture.
Automatically adds tolerations corresponding to the applied taints to your deployed Pods, which lets GKE place those Pods on the new nodes.

For example, if you request the Scale-Out ComputeClass for a Pod:

Autopilot adds a taint specific to Scale-Out for those nodes.
Autopilot adds a toleration for that taint to the Scale-Out Pods.

Pods that don't request Scale-Out won't get the toleration. As a result, GKE won't schedule those Pods on the Scale-Out nodes.

If you don't explicitly request a ComputeClass in your workload specification, Autopilot schedules Pods on nodes that use the default container-optimized compute platform. Most general-purpose workloads can run with no issues on this platform.

How to request a CPU architecture

In some cases, your workloads might be built for a specific architecture, such as Arm. The Scale-Out ComputeClass supports multiple CPU architectures. You can request a specific architecture alongside your ComputeClass request by specifying a label in your node selector or node affinity rule, such as in the following example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-arm
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-arm
  template:
    metadata:
      labels:
        app: nginx-arm
    spec:
      nodeSelector:
        cloud.google.com/compute-class: COMPUTE_CLASS
        kubernetes.io/arch: ARCHITECTURE
      containers:
      - name: nginx-arm
        image: nginx
        resources:
          requests:
            cpu: 2000m
            memory: 2Gi

Replace ARCHITECTURE with the CPU architecture that you want, such as arm64 or amd64. You can select a maximum of one architecture in your workload. The ComputeClass that you select must support your specified architecture.

If you don't explicitly request an architecture, Autopilot uses the default architecture of the ComputeClass.

Arm architecture on Autopilot

Autopilot supports requests for nodes that use the Arm CPU architecture. Arm nodes are more cost-efficient than similar x86 nodes while delivering performance improvements. For instructions to request Arm nodes, refer to Deploy Autopilot workloads on Arm architecture.

Ensure that you're using the correct images in your deployments. If your Pods use Arm images and you don't request Arm nodes, Autopilot schedules the Pods on x86 nodes and the Pods will crash. Similarly, if you accidentally use x86 images but request Arm nodes for the Pods, the Pods will crash.

Default, minimum, and maximum resource requests

When choosing a ComputeClass for your Autopilot workloads, make sure that you specify resource requests that meet the minimum and maximum requests for that ComputeClass. For information about the default requests, as well as the minimum and maximum requests for each ComputeClass, refer to Resource requests and limits in GKE Autopilot.