Containers & Kubernetes

GKE workload rightsizing — from recommendations to action

May 19, 2022

Karol Gołąb

Software Engineer, GKE

Roman Arcea

Product Manager, GKE

Do you know how to rightsize a workload in Kubernetes? If you’re not 100% sure, we have some great news for you! Today, we are launching a fully embedded, out-of-the-box experience to help you with that complex task. When you run your applications on Google Kubernetes Engine (GKE), you now get an end-to-end workflow that helps you discover optimization opportunities, understand workload specific resource request suggestions and, most importantly, act on those recommendations — all in a matter of seconds.

This workload optimization workflow helps rightsize applications by looking at Kubernetes resource requests and limits, which are often one of the largest sources of resource waste. Correctly configuring your resource requests can be the difference between an idle cluster and a cluster that has been downscaled in response to actual resource usage.

If you’re new to GKE, you can save time and money by following the rightsizer’s recommended resource request settings. If you’re already running workloads on GKE, you can also use it to quickly assess optimization opportunities for your existing deployments.

Then, to optimize your workloads even more, combine these new workload rightsizing capabilities with GKE Autopilot, which is priced based on Pod resource requests. With GKE Autopilot, any optimizations you make to your Pod resource requests (assuming they are over the minimum) are directly reflected on your bill.

We’re also introducing a new metric for Cloud Monitoring that provides resource requests suggestions for each individual eligible workload, based on its actual usage over time.

Seamless workload rightsizing with GKE

When you run a workload on GKE, you can use cost optimization insights to discover your cluster and workload rightsizing opportunities right in the console.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_GKE_workload.max-1700x1700.jpg

Here, you can see your workload's actual usage and get signals for potentially undersized workloads that are at risk of either reliability or performance impact because they have low resource requests.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_GKE_workload.max-800x800.jpg

However, taking the next step and correctly rightsizing those applications has always been a challenge — especially at scale. Not anymore with GKE’s new workload rightsizing capability.

Start by picking up the workload you want to optimize. Usually, the best candidates are the ones where there’s a considerable divergence between resource requests and limits and actual usage. In the cost optimization tab of the GKE workloads console, just look for the workloads with a lot of bright green.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_GKE_workload.max-800x800.jpg

Once you pick a workload, go to workload details and choose “Actions” => “Scale” => “Edit resource requests” to get more step-by-step optimization guidance.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_GKE_workload.max-800x800.jpg

The guidance you receive relies heavily on new “Recommended per replica request cores” and “Recommended per replica request bytes” metrics (the same metrics that are available in Cloud Monitoring), which are both based on actual workload usage. You can access this view for every eligible GKE deployment, with no configuration on your part.

https://storage.googleapis.com/gweb-cloudblog-publish/images/5_GKE_workload.max-1500x1500.jpg

Once you confirm the values that are best for your deployment, you can edit the resource requests and limits directly in the GKE console, and they will be directly applied to your workloads.

https://storage.googleapis.com/gweb-cloudblog-publish/images/6_GKE_workload.max-1300x1300.jpg

Note: Suggestions are based on the observed usage patterns of your workloads and might not always be the best fit for your application. Each case might have its corner cases and specific needs. We advise a comprehensive check and understanding of values that are best for your specific workload.

Note: Due to limited visibility into the way Java workloads use memory, we do not support memory recommendations for JVM-based workloads.

Optionally, if you’d rather set the resource requests and limits from outside the GKE console, you can generate a YAML file with the recommended settings that you can use to configure your deployments.

https://storage.googleapis.com/gweb-cloudblog-publish/images/7_GKE_workload.max-1300x1300.jpg

Note: Workloads with horizontal pod autoscaling enabled will not receive suggested values on the same metric for which horizontal pod autoscaling is configured. For instance, if your workload has HPA configured for CPU, only memory suggestions will be displayed.

For more information about specific workload eligibility and compatibility with other scaling mechanisms such as horizontal pod autoscaling, check out the feature documentation here.

Next-level efficiency with GKE Autopilot and workload rightsizing

We’ve talked extensively about GKE Autopilot as one of GKE’s key cost optimization mechanisms. GKE Autopilot provides a fully managed infrastructure offering that eliminates the need for nodepool and VM-level optimization, thus removing the bin-packing optimization challenges related to operating VMs, as well as unnecessary resource waste and day-two operations efforts.

In GKE Autopilot, you pay for the resources you request. Combined with workload rightsizing, which primarily targets resource request optimization, you can easily now address two out of three main issues that lead to optimization gaps: app right-sizing and bin-packing. By running eligible workloads on GKE Autopilot and improving their resource requests, you should start to see a direct, positive impact on your bill right away!

https://storage.googleapis.com/gweb-cloudblog-publish/images/8_GKE_workload.max-1300x1300.jpg

Rightsizing metrics and more resources for optimizing GKE

To support the new optimization workflow we also launched two new metrics called “Recommended per replica request cores” and “Recommended per replica request bytes”. Both are available in the Kubernetes Scale metric group in Cloud Monitoring under “Kubernetes Scale” => “Autoscaler” => “Recommended per replica request”. You can also use these metrics to build your own customization and ranking views and experiences, and export latest optimization opportunities.

https://storage.googleapis.com/gweb-cloudblog-publish/images/9_GKE_workload.max-1600x1600.jpg

Excited about the new optimization opportunities? Ready for a recap of many other things you could do to run GKE more optimally? Check our Best Practices for Running Cost Effective Kubernetes Applications, the Youtube series, and have a look at the GKE best practices to lessen overprovisioning.

Containers & Kubernetes

Google Cloud at KubeCon EU: New projects, updated services, and how to connect

Engage with experts and learn more about Google Kubernetes Engine at KubeCon EU.

By Alexandra Bush • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/kubecon.max-900x900.jpg

Posted in

Containers & Kubernetes

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

By Peter Schuurman • 4-minute read

Containers & Kubernetes

How Google Does It: Building the largest known Kubernetes cluster, with 130,000 nodes

By Besher Massri • 10-minute read

Containers & Kubernetes

GKE: From containers to agents, the unified platform for every modern workload

By Drew Bradstock • 9-minute read

Containers & Kubernetes

Introducing Agent Sandbox: Strong guardrails for agentic AI on Kubernetes and GKE

By Brandon Royal • 4-minute read

GKE workload rightsizing — from recommendations to action

Karol Gołąb

Roman Arcea

Seamless workload rightsizing with GKE

Next-level efficiency with GKE Autopilot and workload rightsizing

Rightsizing metrics and more resources for optimizing GKE

Google Cloud at KubeCon EU: New projects, updated services, and how to connect

Related articles

Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

How Google Does It: Building the largest known Kubernetes cluster, with 130,000 nodes

GKE: From containers to agents, the unified platform for every modern workload

Introducing Agent Sandbox: Strong guardrails for agentic AI on Kubernetes and GKE