Containers & Kubernetes

Accelerate GKE cluster autoscaling with faster concurrent node pool auto-creation

January 28, 2026

Daniel Kłobuszewski

Software Engineer, GKE

Eyal Yablonka

Product Manager, GKE

Try Gemini 3.1 Pro

Our most intelligent model available yet for complex tasks on Gemini Enterprise and Vertex AI

Try now

We're excited to announce concurrency in Google Kubernetes Engine (GKE) node pool auto-creation, to significantly reduce provisioning latency and autoscaling performance. Internal benchmarks show up to an 85% improvement in provisioning speed, especially benefiting heterogeneous workloads, multi-tenant clusters, workloads that use multiple ComputeClass priorities, and large AI training workloads, by cutting deployment time and enhancing goodput. The improvements are already under the hood when you allow GKE to automatically create node pools for pending Pods.

The problem

GKE node pools take nodes with identical configurations and group them, unifying operations such as resizing and upgrading. A new empty node pool takes 30-45 seconds to create. GKE can automate node-pool creation based on Pod resource needs.

Compare this to prior versions of GKE node auto-provisioning (NAP), which executed one operation at a time, leading to increased deployment and scaling latencies. This was particularly noticeable in clusters that needed multiple node pools; the 30-45 seconds it took to create each new node pool really added up, impacting the cluster’s overall autoscaling responsiveness. During the time a node pool was being created, other node pool operations had to wait.

GKE node pool auto-creation is core to Autopilot mode, whether you’re using it with an Autopilot or Standard cluster; optionally, you can also use it if you’re operating in GKE Standard mode. Any time a new virtual machine (VM) shape is added by Autopilot, a node pool is created under the hood.

The solution

Support for node pool concurrency allows the system to handle multiple operations at the same time, so clusters can be deployed and scale out to different node types much faster. The improvement is available starting from version 1.34.1-gke.1829001. To benefit from this improvement, simply upgrade to the latest version of GKE, no additional configuration is required.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image2_yI6qepE.max-1400x1400.png

To run the benchmark and observe the results firsthand, here is our benchmarking code.

Why node pool concurrency matters

Concurrent node pool auto-creation delivers substantial benefits for a wide range of GKE use cases:

Heterogeneous workloads and multi-tenant clusters - Many workloads, including AI and machine learning, need distinct node pools, and a single cluster often serves multiple tenants. This leads to the requirement for multiple, differently configured node pools, which must be deployed or managed quickly and efficiently within a single cluster.
AI workloads and multi-host TPU slices - Workloads that use many multi-host TPU slices need a distinct node pool for each slice. Being able to create multiple new node pools quickly with concurrency helps ensure fast scaling. More generally, concurrent node pool auto-creation enables AI workloads to benefit from improved provisioning performance and better resource utilization (goodput).
Cost optimization with Spot instances and multiple ComputeClass priorities - Preemptible nodes must be segregated into distinct node pools from their non-preemptible counterparts, even if their configurations are identical. More generally, custom ComputeClass priorities are typically represented by separate node pools, meaning a cluster often has distinct node pools corresponding to different priority levels. These scenarios are now better handled using parallel operations.

Faster provisioning and startup times

At Google Cloud, we're dedicated to improving the performance of your GKE environment. Concurrent node pool auto-creation is one way we’re improving provisioning performance. We are also improving node startup latency with fast-starting nodes, container pull latency with image streaming, and Pod scheduling latency with the container-optimized compute platform. To learn more and get started, check out these resources: