Containers & Kubernetes

With MultiKueue, grab GPUs for your GKE cluster, wherever they may be

February 14, 2025

Jean-Baptiste Leroy

Customer Engineer

Try Gemini 2.5

Our most intelligent model is now available on Vertex AI

Artificial Intelligence (AI) and large language models (LLMs) are experiencing explosive growth, powering applications from machine translation to artistic creation. These technologies rely on intensive computations that require specialized hardware resources, like GPUs. But access to GPUs can be challenging, both in terms of availability and cost.

For Google Cloud users, the introduction of Dynamic Workload Scheduler (DWS) transformed how you can access and use GPU resources, particularly within a Google Kubernetes Engine (GKE) cluster. Dynamic Workload Scheduler optimizes AI/ML resource access and spending by simultaneously scheduling necessary accelerators like TPUs and GPUs across various Google Cloud services, improving the performance of training and fine-tuning jobs.

Further, Dynamic Workload Scheduler offers an easy and straightforward integration between GKE and Kueue, a cloud-native job scheduler, making it easier to access GPUs as quickly as possible, in a given region, for a given GKE cluster.

But what if you want to deploy your workload in any available region, as soon as possible, as soon as DWS provides you the resources your workload needs?

This is where MultiKueue, a Kueue feature, comes into play. With MultiKueue, GKE, and Dynamic Workload Scheduler, you can wait for accelerators in multiple regions. Dynamic Workload Scheduler automatically provisions resources in the best GKE clusters as soon as they are available. By submitting workloads to a global queue, MultiKueue executes them in the region with available GPU resources, helping to optimize global resource usage, lowering costs, and speeding up processing.

MultiKueue

MultiKueue enables workload distribution across multiple GKE clusters in different regions. By identifying clusters with available resources, MultiKueue simplifies the process of dispatching jobs to the optimal location.

Dynamic Workload Scheduler on GKE Autopilot, our managed Kubernetes service that automatically handles the provisioning, scaling, security, and maintenance of your container infrastructure; it’s supported on GKE Autopilot 1.30.3. Let’s take a deeper look at how to set up and manage MultiKueue with Dynamic Workload Scheduler, so you can obtain GPU resources faster.

MultiKueue cluster roles

MultiKueue provides two distinct cluster roles:

Manager cluster - Establish and maintain the connection with the worker clusters, as well as create and monitor remote objects (workloads or jobs) while keeping the local ones in sync.
Worker cluster - A simple standalone Kueue cluster that lets you execute the job submitted by the manager cluster.

Creating a MultiKueue cluster

In this example we create four GKE Autopilot clusters:

One manager cluster in europe-west4
Three worker clusters in

europe-west4
us-east4
asia-southeast1

Let’s take a look at how this works in the following step-by-step example. You can access the files for this example in this github repository.

1. Clone github repository

2. Create GKE clusters

This terraform script creates the required GKE clusters and adds four entries to your kubeconfig files:

manager-europe-west4
worker-us-east4
worker-europe-west4
worker-asia-southeast1

Then you can switch between contexts easily with

3. Install and configure MultiKueue

This script:

Installs kueue in the four clusters
Enables and configures MultiKueue in the manager cluster
Creates a podMonitoring resource for each clusters that enables kueue metrics to be sent to Google Cloud Managed Service for Prometheus
Configures the connection between the manager cluster and the worker clusters
Configures Kueue in the worker clusters

GKE clusters, Kueue with MultiKueue, and DWS are now configured and ready to use. Once you submit your jobs, the Kueue manager distributes them across the three worker clusters.

In the dws-multi-worker.yaml file, you'll find the Kueue configuration for the worker clusters, including the manager configuration.

The following script provides a basic example of how to set up the MultiKueue AdmissionCheck with three worker clusters.

4. Submit jobs

Ensure you're using the manager kubecontext when submitting jobs.

To observe how the MultiKueue admission check distributes jobs among worker clusters, you can submit the job creation request multiple times.

5. Get jobs status

To check the job status and determine the scheduled region, execute the following command

6. Delete resources

Finally, be sure to delete the four GKE clusters you created to try out this functionality:

What’s next

So that's how you can leverage MultiKueue, GKE, and DWS to streamline global job execution, optimize speed, and eliminate the need for manual node management!

This setup also addresses the needs of those with data residency requirements, allowing you to dedicate subsets of clusters for different workloads and ensure compliance.

To further enhance your setup, you can leverage advanced kueue features like team management with local kueue or workload priority classes. Additionally, you can gain valuable insights by creating a Grafana or Cloud Monitoring dashboard that utilizes Kueue metrics, which are automatically handled by Google Managed Service for Prometheus via the PodMonitoring resources.

Posted in

Containers & Kubernetes

Upgrading Kubernetes versions just got safer with minor version rollback

By Siyuan Zhang • 5-minute read

Containers & Kubernetes

A more native experience for Cloud TPUs with Ray on GKE

By Nisha Mariam Johnson • 6-minute read

Containers & Kubernetes

Evolving Ray and Kubernetes together for the future of distributed AI and ML

By Andrew Sy Kim • 6-minute read

Containers & Kubernetes

Why GKE & Gemini CLI are better together

By Adam Parco • 4-minute read

With MultiKueue, grab GPUs for your GKE cluster, wherever they may be

Jean-Baptiste Leroy

Try Gemini 2.5

MultiKueue

MultiKueue cluster roles

Creating a MultiKueue cluster

What’s next

Related articles

Upgrading Kubernetes versions just got safer with minor version rollback

A more native experience for Cloud TPUs with Ray on GKE

Evolving Ray and Kubernetes together for the future of distributed AI and ML

Why GKE & Gemini CLI are better together