Containers & Kubernetes

Introducing support for GPU workloads and even larger Pods in GKE Autopilot

September 29, 2022

William Denniss

Group Product Manager, Google Kubernetes Engine

Autopilot is a fully managed mode of operation for Google Kubernetes Engine (GKE). But being fully managed doesn’t mean that you need to be limited in what you can do with it! Our goal for Autopilot is to support all user workloads (those other than administrative workloads which require privileged access to nodes) so they can be run across the full GKE product.

Many workloads, especially those running AI/ML training and inference require GPU hardware. To enable such workloads on Autopilot, we are launching support in Preview for the NVIDIA T4 and A100 GPUs in Autopilot. Now you can run ML training, inference, video encoding and all other workloads that need a GPU, with the convenience of Autopilot’s fully-managed operational environment.

The great thing about running GPU workloads on Autopilot is that all you need to do is specify your GPU requirements in your Pod configuration, and we take care of the rest. No need to install drivers separately, or worry about non-GPU pods running on your valuable GPU nodes, because Autopilot takes care of GPU configuration and Pod placement automatically.

You also don’t have to worry about a GPU node costing you money without any currently running workloads, since with Autopilot you are just billed for Pod running time. Once the GPU Pod terminates, so do any associated charges—and you’re not charged for the setup or tear down time of the underlying resource either.

Some of our customers and partners have already been trying it out. Our customer CrowdRiff had the following to say:

"CrowdRiff is an AI-powered visual content marketing platform that provides user-generated content discovery, digital asset management, and seamless content delivery for the travel and tourism industry. As users of Google Kubernetes Engine (GKE) and its support for running GPU-accelerated workloads, we were excited to learn about GKE Autopilot's upcoming support for GPUs. Through our initial testing of the feature we found that we were able to easily take advantage of GPUs for our services without having to manage the underlying infrastructure to support this. Utilizing this functionality we expect to see reduced costs versus using standard GKE clusters and lower operational complexity for our engineers." — Steven Pall, Reliability Engineer, CrowdRiff

And our partner SADA comments:

“Our recommendation to customers is to leverage Autopilot whenever possible because of its ease of use and the reduction of operational burden. The whole GKE node layer is offloaded to Google, and GPU pods for Autopilot enable an entirely new workload type to run using Autopilot. The Autopilot mode is an exciting enhancement for our customers to run their AI/ML jobs.” — Christopher Hendrich, Associate CTO, SADA

Using GPUs with Autopilot

You can request T4 and A100 GPUs in several predefined GPU quantities. You can accept the defaults for CPU and Memory, or specify those resources as well, within certain ranges. Here’s an example Pod that requests multiple T4 GPUs.

Listing 1: Simply specify your nvidia-tesla-t4 node selector and your pod will run on GPU

^{Listing 1: Simply specify your nvidia-tesla-t4 node selector and your pod will run on GPU}

Those few lines in your Kubernetes configuration is all you need to do! Just specify your GPU requirements in the PodSpec, and create the object via kubectl. Autopilot takes care of tainting GPU nodes to prevent non-GPU Pods running on them, and tolerating these taints in your workload specifications - all automatically. We will automatically provision a GPU-enabled node matching your requirements, including any required Nvidia driver setup.

If for some reason your GPU Pod doesn’t become ready, check what’s going on with kubectl get events -w, and double-check that your resource values are within the supported ranges.

Run Large Pods on Autopilot with the Balanced Compute Class

And GPU isn’t the only thing we’re adding today! Autopilot already supports a leading 28 vCPU maximum Pod size with the default compute, and up to 54 vCPU with the Scale-Out compute class, but we wanted to push the limits even higher for those workloads that need a bit extra.

For those times when you need computing resources on the larger end of the spectrum, we’re excited to also introduce the Balanced compute class supporting Pod resource sizes up to 222vCPU and 851GiB! Balanced joins the existing Scale-Out compute class (which has a focus on high single-threaded CPU performance), and our generic compute platform (designed for everyday workloads).

To get started with Balanced, simply add a node selector to your pods. Listing 2 is an example pod definition. Be sure to adjust the resource requirements to what you actually need though! Refer to this page for the pricing information of Balanced Pods.

^{Listing 2: Run large pods using the Balanced compute class}

As with GPU Pods, Autopilot automatically handles the placement of Balanced compute class Pods for you, so that you will only be charged the Balanced compute class prices for Pods that directly specify it. By default, Pods without the compute class nodeSelector will run on Autopilot’s original compute platform (where they can request up to 28 vCPUs).

We can’t wait to see what you do with these new capabilities of GKE Autopilot.

View our docs to read more about Autopilot GPU, and the new Balanced Compute Class.

Posted in

https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE_Wfx45fA.max-700x700.jpg

Application Modernization

Ninja Van: delivering flexibility, stability and scalability to core applications with a cloud container platform

By Ivan Kenneth Wang • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/insights_2022.max-700x700.jpg

Containers & Kubernetes

Innovating in patent search: How IPRally leverages AI with Google Kubernetes Engine and Ray

By Shubhika Taneja • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/DO_NOT_USE_HBAdZzf.max-700x700.jpg

Containers & Kubernetes

Introducing Hyperdisk Balanced, a new storage option for stateful Kubernetes workloads

By Spencer Bischof • 5-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/Next24_Blog_blank_2-05.max-700x700.jpg

Containers & Kubernetes

Gemma on Google Kubernetes Engine deep dive: New innovations to serve open generative AI models

By Vilobh Meshram • 7-minute read

Introducing support for GPU workloads and even larger Pods in GKE Autopilot

William Denniss

Using GPUs with Autopilot

Run Large Pods on Autopilot with the Balanced Compute Class

Related articles

Ninja Van: delivering flexibility, stability and scalability to core applications with a cloud container platform

Innovating in patent search: How IPRally leverages AI with Google Kubernetes Engine and Ray

Introducing Hyperdisk Balanced, a new storage option for stateful Kubernetes workloads

Gemma on Google Kubernetes Engine deep dive: New innovations to serve open generative AI models