This tutorial shows you how to optimize available resources by scheduling Jobs on Google Kubernetes Engine (GKE) with Kueue. In this tutorial, you learn to use Kueue to effectively manage and schedule batch jobs, improve resource utilization, and simplify workload management. You set up a shared cluster for two tenant teams where each team has its own namespace and each team creates Jobs that share global resources. You also configure Kueue to schedule the Jobs based on resource quotas that you define.
This tutorial is for Cloud architects and Platform engineers who are interested in implementing a batch system using GKE. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE user roles and tasks.
Before reading this page, ensure that you're familiar with the following:
Background
Jobs are applications that run to completion, such as machine learning, rendering, simulation, analytics, CI/CD, and similar workloads.
Kueue is a cloud-native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
Kueue has the following characteristics:
- It is optimized for cloud architectures, where resources are heterogeneous, interchangeable, and scalable.
- It provides a set of APIs to manage elastic quotas and manage Job queueing.
- It does not re-implement existing capabilities such as autoscaling, pod scheduling, or Job lifecycle management.
- Kueue has built-in support for the Kubernetes
batch/v1.Job
API. - It can integrate with other job APIs.
Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.
Create the ResourceFlavor
A ResourceFlavor is an object that represents the variations in the nodes available in your cluster by associating them with node labels and taints. For example, you can use ResourceFlavors to represent VMs with different provisioning guarantees (for example, spot versus on-demand), architectures (for example, x86 versus ARM CPUs), brands and models (for example, Nvidia A100 versus T4 GPUs).
In this tutorial, the kueue-autopilot
cluster has homogeneous resources.
As a result, create a single ResourceFlavor for CPU, memory, ephemeral-storage,
and GPUs, with no labels or taints.
kubectl apply -f flavors.yaml
Create the ClusterQueue
A ClusterQueue is a cluster-scoped object that manages a pool of resources such as CPU, memory, GPU. It manages the ResourceFlavors, and limits the usage and dictates the order in which workloads are admitted.
Deploy the ClusterQueue:
kubectl apply -f cluster-queue.yaml
The order of consumption is determined by .spec.queueingStrategy
, where there are two configurations:
BestEffortFIFO
- The default queueing strategy configuration.
- The workload admission follows the first in first out (FIFO) rule, but if there is not enough quota to admit the workload at the head of the queue, the next one in line is tried.
StrictFIFO
- Guarantees FIFO semantics.
- Workload at the head of the queue can block queueing until the workload can be admitted.
In cluster-queue.yaml
, you create a new ClusterQueue called cluster-queue
. This
ClusterQueue manages four resources, cpu
, memory
, nvidia.com/gpu
and
ephemeral-storage
with the flavor created in flavors.yaml
.
The quota is consumed by the requests in the workload Pod specs.
Each flavor includes usage limits represented as
.spec.resourceGroups[].flavors[].resources[].nominalQuota
. In this case, the ClusterQueue admits
workloads if and only if:
- The sum of the CPU requests is less than or equal to 10
- The sum of the memory requests is less than or equal to 10Gi
- The sum of GPU requests is less than or equal to 10
- The sum of the storage used is less than or equal to 10Gi
Create the LocalQueue
A LocalQueue is a namespaced object that accepts workloads from users in the namespace.
LocalQueues from different namespaces can point to
the same ClusterQueue where they can share the resources' quota. In this case,
LocalQueue from namespace team-a
and team-b
points to the same ClusterQueue
cluster-queue
under .spec.clusterQueue
.
Each team sends their workloads to the LocalQueue in their own namespace. Which are then allocated resources by the ClusterQueue.
Deploy the LocalQueues:
kubectl apply -f local-queue.yaml
Create Jobs and observe the admitted workloads
In this section, you create Kubernetes Jobs in the namespace team-a
. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.
The Job in the namespace team-a
has the following attributes:
- It points to the
lq-team-a
LocalQueue. - It requests GPU resources by setting the
nodeSelector
field tonvidia-tesla-t4
. - It is composed of three Pods that sleep for 10 seconds in parallel. Jobs are
cleaned up after 60 seconds according to the value defined in the
ttlSecondsAfterFinished
field. - It requires 1,500 milliCPU, 1536 Mi of memory, 1,536 Mi of ephemeral storage, and three GPUs since there are three Pods.
Jobs are also created under the file
job-team-b.yaml
where its namespace belongs to team-b
, with requests to
represent different teams with different needs.
To learn more, see deploying GPU workloads in Autopilot.
In a new terminal, observe the status of the ClusterQueue that refreshes every two seconds:
watch -n 2 kubectl get clusterqueue cluster-queue -o wide
In a new terminal, observe the status of the nodes:
watch -n 2 kubectl get nodes -o wide
In a new terminal, create Jobs to LocalQueue from namespace
team-a
andteam-b
every 10 seconds:./create_jobs.sh job-team-a.yaml job-team-b.yaml 10
Observe the Jobs being queued up, admitted in the ClusterQueue, and nodes being brought up with GKE Autopilot.
Obtain a Job from namespace
team-a
:kubectl -n team-a get jobs
The outcome is similar to the following:
NAME COMPLETIONS DURATION AGE sample-job-team-b-t6jnr 3/3 21s 3m27s sample-job-team-a-tm7kc 0/3 2m27s sample-job-team-a-vjtnw 3/3 30s 3m50s sample-job-team-b-vn6rp 0/3 40s sample-job-team-a-z86h2 0/3 2m15s sample-job-team-b-zfwj8 0/3 28s sample-job-team-a-zjkbj 0/3 4s sample-job-team-a-zzvjg 3/3 83s 4m50s
Copy a Job name from the previous step and observe the admission status and events for a Job through the Workloads API:
kubectl -n team-a describe workload JOB_NAME
When the pending Jobs start increasing from the ClusterQueue, end the script by pressing
CTRL + C
on the running script.Once all Jobs are completed, notice the nodes being scaled down.