Jump to Content
Containers & Kubernetes

GKE under the hood: What’s new with Cluster Autoscaler

June 24, 2024
Daniel Kłobuszewski

Senior Software Engineer, GKE

Roman Arcea

Product Manager, GKE

Try Gemini 1.5 models

Google's most advanced multimodal models in Vertex AI

Try it

In the world of cloud infrastructure, sometimes the most impactful features are the ones you never have to think about. When it comes to Google Kubernetes Engine (GKE), we have a long history of quietly innovating behind the scenes, optimizing the invisible gears that keep your clusters running smoothly. These enhancements might not always grab the headlines, but they deliver tangible benefits in the form of improved performance, reduced latency, and a simpler user experience.

Today, we're shining a spotlight on some of these "invisible" GKE advancements, particularly in the realm of infrastructure autoscaling. Let's dive into how recent changes in the Cluster Autoscaler (CA) can significantly enhance your workload performance without requiring any additional configuration on your part.

What’s new with Cluster Autoscaler

The GKE team has been hard at work refining the Cluster Autoscaler, the component responsible for automatically adjusting the size of your node pools based on demand. Here's a breakdown of some key improvements:

  • Target replica count tracking: This feature accelerates scaling when you add several Pods simultaneously (think new deployments or large resizes). It also eliminates a previous 30-second delay that affected GPU autoscaling. This capability is headed to open-source so that the entire community can benefit from improved Kubernetes performance.

  • Fast homogeneous scale-up: If you have numerous identical pods, this optimization speeds up the scaling process by efficiently bin-packing Pods onto nodes.

  • Less CPU waste: The CA now makes decisions faster, which is especially noticeable when you need multiple scale-ups across different node pools. Additionally, CA is smarter about when to run its control loop, avoiding unnecessary delays.

  • Memory optimization: Although not directly visible to the user, the CA has also undergone memory optimizations that contribute to its overall efficiency.

Benchmarking results

To demonstrate the real-world impact of these changes, we conducted a series of benchmarks across two GKE versions (1.27 and 1.29) and scenarios:


  • Autopilot generic 5k scaled workload: We deployed a 5,000-replica workload on Autopilot and measured the time it took for all pods to become ready.

  • Busy batch cluster: We simulated a high-traffic batch cluster by creating 100 node pools and deploying multiple 20-replica jobs at regular intervals. We then measured the scheduling latency.

  • 10-replica GPU test: A 10-replica GPU deployment was used to measure the time for all pods to become ready.


  • Application end-user latency test: We employed a generic web application that responds predictably to an API call with defined response and latency when not under load. Using a standard load testing framework (Locust), we assessed the performance of various GKE versions under a typical traffic pattern that prompts GKE to scale with both HPA and NAP. We measured the P50 and P95 end-user latency with the application scaled on CPU with an HPA CPU target of 50%.

Results highlights (Illustrative) 



GKE v1.27


GKE v1.29

Autopilot generic 5k replica deployment


7m 30s

3m 30s (55% improvement)

Busy batch cluster

P99 scheduling latency

9m 38s

7m 31s

(20% improvement)

10-replica GPU


2m 40s

2m 09s

(20% improvement)

Application end-user latency

Application response latency as measured by the end user. P50 and P95 in seconds.

P50: 0.43s

P95: 3.4s

P50: 0.4s

P95: 2.7s

(P95: 20% improvement)

(Note: These results are illustrative and will vary based on your specific workload and configuration).

Significant improvements, such as reducing the deployment time of 5k Pods by half or enhancing application response latency at the 95th percentile by 20%, typically necessitate intensive optimization efforts or overprovisioned infrastructure. The new changes to Cluster Autoscaler stand out by delivering these gains without requiring complex configurations, idle resources, or overprovisioning.


At Google Cloud, we’re committed to making your Kubernetes experience not just powerful, but also effortless to manage and use. By optimizing underlying mechanisms such as Cluster Autoscaler, we help GKE administrators focus on their applications and business goals, with the confidence that their clusters nare scaling efficiently and reliably.

Each new GKE version brings a number of new capabilities, either visible or invisible, so make sure to stay up to date with the latest releases. And stay tuned for more insights into how we’re continuing to evolve GKE to meet the demands of modern cloud-native applications!

Posted in