Jump to Content
Containers & Kubernetes

With MultiKueue, grab GPUs for your GKE cluster, wherever they may be

February 14, 2025
Jean-Baptiste Leroy

Customer Engineer

Join us at Google Cloud Next

Early bird pricing available now through Feb 14th.

Register

Artificial Intelligence (AI) and large language models (LLMs) are experiencing explosive growth, powering applications from machine translation to artistic creation. These technologies rely on intensive computations that require specialized hardware resources, like GPUs. But access to GPUs can be challenging, both in terms of availability and cost.

For Google Cloud users, the introduction of Dynamic Workload Scheduler (DWS) transformed how you can access and use GPU resources, particularly within a Google Kubernetes Engine (GKE) cluster. Dynamic Workload Scheduler optimizes AI/ML resource access and spending by simultaneously scheduling necessary accelerators like TPUs and GPUs across various Google Cloud services, improving the performance of training and fine-tuning jobs.

Further, Dynamic Workload Scheduler offers an easy and straightforward integration between GKE and Kueue, a cloud-native job scheduler, making it easier to access GPUs as quickly as possible, in a given region, for a given GKE cluster.

But what if you want to deploy your workload in any available region, as soon as possible, as soon as DWS provides you the resources your workload needs? 

This is where MultiKueue, a Kueue feature, comes into play. With MultiKueue, GKE, and Dynamic Workload Scheduler, you can wait for accelerators in multiple regions. Dynamic Workload Scheduler automatically provisions resources in the best GKE clusters as soon as they are available. By submitting workloads to a global queue, MultiKueue executes them in the region with available GPU resources, helping to optimize global resource usage, lowering costs, and speeding up processing.

MultiKueue

MultiKueue enables workload distribution across multiple GKE clusters in different regions. By identifying clusters with available resources, MultiKueue simplifies the process of dispatching jobs to the optimal location.

Dynamic Workload Scheduler on GKE Autopilot, our managed Kubernetes service that automatically handles the provisioning, scaling, security, and maintenance of your container infrastructure; it’s supported on GKE Autopilot 1.30.3. Let’s take a deeper look at how to set up and manage MultiKueue with Dynamic Workload Scheduler, so you can obtain GPU resources faster.  

MultiKueue cluster roles

MultiKueue provides two distinct cluster roles: 

  • Manager cluster - Establish and maintain the connection with the worker clusters, as well as create and monitor remote objects (workloads or jobs) while keeping the local ones in sync.

  • Worker cluster - A simple standalone Kueue cluster that lets you execute the job submitted by the manager cluster.

Creating a MultiKueue cluster

In this example we create four GKE Autopilot clusters:  

  • One manager cluster in europe-west4

  • Three worker clusters in 

    • europe-west4

    • us-east4

    • asia-southeast1

Let’s take a look at how this works in the following step-by-step example. You can access the files for this example in this github repository.

1. Clone github repository

Loading...

2. Create GKE clusters

Loading...

This terraform script creates the required GKE clusters and adds four entries to your kubeconfig files: 

  • manager-europe-west4

  • worker-us-east4

  • worker-europe-west4

  • worker-asia-southeast1

Then you can switch between contexts easily with

Loading...

3. Install and configure MultiKueue

Loading...

This script: 

  • Installs kueue in the four clusters

  • Enables and configures MultiKueue in the manager cluster

  • Creates a podMonitoring resource for each clusters that enables kueue metrics to be sent to Google Cloud Managed Service for Prometheus

  • Configures the connection between the manager cluster and the worker clusters

  • Configures Kueue in the worker clusters

GKE clusters, Kueue with MultiKueue, and DWS are now configured and ready to use. Once you submit your jobs, the Kueue manager distributes them across the three worker clusters.

In the dws-multi-worker.yaml file, you'll find the Kueue configuration for the worker clusters, including the manager configuration. 

The following script provides a basic example of how to set up the MultiKueue AdmissionCheck with three worker clusters.

Loading...

4. Submit jobs

Ensure you're using the manager kubecontext when submitting jobs.

Loading...

To observe how the MultiKueue admission check distributes jobs among worker clusters, you can submit the job creation request multiple times.

5. Get jobs status

To check the job status and determine the scheduled region, execute the following command

Loading...

6. Delete resources

Finally, be sure to delete the four GKE clusters you created to try out this functionality:

Loading...

What’s next

So that's how you can leverage MultiKueue, GKE, and DWS to streamline global job execution, optimize speed, and eliminate the need for manual node management! 

This setup also addresses the needs of those with data residency requirements, allowing you to dedicate subsets of clusters for different workloads and ensure compliance.

To further enhance your setup, you can leverage advanced kueue features like team management with local kueue or workload priority classes. Additionally, you can gain valuable insights by creating a Grafana or Cloud Monitoring dashboard that utilizes Kueue metrics, which are automatically handled by Google Managed Service for Prometheus via the PodMonitoring resources.

Posted in