Containers & Kubernetes

GKE Autopilot mode gets burstable workloads and smaller Pod sizes

April 3, 2024

William Denniss

Group Product Manager, Google Kubernetes Engine

Autopilot mode for Google Kubernetes Engine (GKE) is our fully managed, Pod-based Kubernetes platform. It provides category-defining features with a fully functional Kubernetes API that supports StatefulSet with block storage, GPUs, and other critical functionality that you don’t often find in nodeless/serverless-style Kubernetes offerings, while still offering a Pod-level SLA and a developer-friendly workload-based API.

But until now, Autopilot, like other products in this category, did not offer the ability to temporarily burst CPU and memory resources beyond what was requested by the workload. I’m happy to announce that now, powered by the unique design of GKE Autopilot on Google Cloud, we are able to bring burstable workload support to GKE Autopilot.

Bursting allows your Pod to temporarily utilize resources outside of those resources that it requests and is billed for. How does this work, and how can Autopilot offer burstable support, given the Pod-based model? The key is that in Autopilot mode we still group Pods together on Nodes. This is what powers several unique features of Autopilot such as our flexible Pod sizes. With this change, the capacity of your Pods is pooled, and Pods that set a limit higher than their requests can temporarily burst into this capacity (if it’s available).

With the introduction of burstable support, we’re also introducing another groundbreaking change: 50m CPU Pods — that is, Pods as small as 1/20th of a vCPU. Until now, the smallest Pod we offered was ¼ of a vCPU (250m CPU) — five times bigger. Combined with burst, the door is now open to run high-density-yet-small workloads on Autopilot, without constraining each Pod to its resource requests.

We’re also dropping the 250m CPU resource increment, so you can create any size of Pod you like between the minimum to the maximum size, for example, 59m CPU, 302m, 808m, 7682m etc (the memory to CPU ratio is automatically kept within a supported range, sizing up if needed). With these finer-grained increments, Vertical Pod Autoscaling now works even better, helping you tune your workload sizes automatically. If you want to add a little more automation to figuring out the right resource requirements, give Vertical Pod Autoscaling a try!

Here’s what our customer Ubie had to say:

https://storage.googleapis.com/gweb-cloudblog-publish/images/ubie.max-700x700.jpg

“GKE Autopilot frees us from managing infrastructure so we can focus on developing and deploying our applications. It eliminates the challenges of node optimization, a critical benefit in our fast-paced startup environment. Autopilot's new bursting feature and lowered CPU minimums offer cost optimization and support multi-container architectures such as sidecars. We were already running many workloads in Autopilot mode, and with these new features, we're excited to migrate our remaining clusters to Autopilot mode.” - Jun Sakata, Head of Platform Engineering, Ubie

A new home for high-density workloads

Autopilot is now a great place to run high-density workloads. Let’s say for example you have a multi-tenant architecture with many smaller replicas of a Pod, each running a website that receives relatively little traffic, but has the occasional spike. Combining the new burstable feature and the lower 50m CPU minimum, you can now deploy thousands of these Pods in a cost-effective manner (up to 20 per vCPU), while enabling burst so that when that traffic spike comes in, that Pod can temporarily burst into the pooled capacity of these replicas.

What’s even better is that you don’t need to concern yourself with bin-packing these workloads, or the problem where you may have a large node that’s underutilized (for example, a 32 core node running a single 50m CPU Pod). Autopilot takes care of all of that for you, so you can focus on what’s key to your business: building great services.

Calling all startups, students, and solopreneurs

Everyone needs to start somewhere, and if Autopilot wasn’t the perfect home for your workload before, we think it is now. With the 50m CPU minimum size, you can run individual containers in us-central1 for under $2/month each (50m CPU, 50MiB). And thanks to burst, these containers can use a little extra CPU when traffic spikes, so they’re not completely constrained to that low size. And if this workload isn’t mission-critical, you can even run it in Spot mode, where the price is even lower.

In fact, thanks to GKE’s free tier, your costs in us-central1 can be as low as $30/month for small workloads (including production-grade load balancing) while tapping into the same world-class infrastructure that powers some of the biggest sites on the internet today. Importantly, if your startup grows, you can scale in-place without needing to migrate — since you’re already running on a production-grade Kubernetes cluster. So you can start small, while being confident that there is nothing limiting about your operating environment. We’re counting on your success as well, so good luck!

And if you’re learning GKE or Kubernetes, this is a great platform to learn on. Nothing beats learning on real production systems — after all, that’s what you’ll be using on the job. With one free cluster per account, and all these new features, Autopilot is a fantastic learning environment. Plus, if you delete your Kubernetes workload and networking resources when you’re done for the day, any associated costs cease as well. When you’re ready to resume, just create those resources again, and you’re back at it. You can even persist state between learning sessions by deleting just the compute resources (and keeping the persistent disks). Don’t forget there’s a $300 free trial to cover your initial usage as well!

Next steps:

Learn about Pod bursting in GKE.
Read more about the minimum and maximum resource requests in Autopilot mode.
Learn how to set resource requirements automatically with VPA.
Try GKE’s Autopilot mode for a workload-based API and simpler Day 2 ops with lower TCO.
Going to Next ‘24? Check out session DEV224 to hear Ubie talk about how it uses burstable workloads in GKE.

Posted in

Containers & Kubernetes

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

By Peter Schuurman • 4-minute read

Containers & Kubernetes

How Google Does It: Building the largest known Kubernetes cluster, with 130,000 nodes

By Besher Massri • 10-minute read

Containers & Kubernetes

GKE: From containers to agents, the unified platform for every modern workload

By Drew Bradstock • 9-minute read

Containers & Kubernetes

Introducing Agent Sandbox: Strong guardrails for agentic AI on Kubernetes and GKE

By Brandon Royal • 4-minute read

GKE Autopilot mode gets burstable workloads and smaller Pod sizes

William Denniss

A new home for high-density workloads

Calling all startups, students, and solopreneurs

Related articles

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

How Google Does It: Building the largest known Kubernetes cluster, with 130,000 nodes

GKE: From containers to agents, the unified platform for every modern workload

Introducing Agent Sandbox: Strong guardrails for agentic AI on Kubernetes and GKE