AI/ML orchestration on GKE documentation
Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. With Google Kubernetes Engine (GKE), you can implement a robust, production-ready AI/ML platform with all the benefits of managed Kubernetes and these capabilities:
- Infrastructure orchestration that supports GPUs and TPUs for training and serving workloads at scale.
- Flexible integration with distributed computing and data processing frameworks.
- Support for multiple teams on the same infrastructure to maximize utilization of resources
Start your next project with $300 in free credit
Build and test a proof of concept with the free trial credits and free monthly usage of 20+ products.
Documentation resources
Serve open models on GKE
-
NEW!
Serve LLMs like Deepseek-R1 671B or Llama 3.1 405B on GKE
-
NEW!
Serve an LLM using TPUs on GKE with KubeRay
-
Tutorial
Serve an LLM using TPU Trillium on GKE with vLLM
-
Tutorial
Quickstart: Serve an LLM using a single GPU on GKE
-
Tutorial
Serve Gemma using GPUs on GKE with Hugging Face TGI
-
Tutorial
Serve Gemma using GPUs on GKE with vLLM
Orchestrate TPUs and GPUs at large scale
-
NEW!
Optimize GKE resource utilization for mixed AI/ML training and inference workloads
-
Video
Introduction to Cloud TPUs for machine learning.
-
Video
Build large-scale machine learning on Cloud TPUs with GKE
-
Video
Serving Large Language Models with KubeRay on TPUs
-
Blog
Machine learning with JAX on Kubernetes with NVIDIA GPUs
Cost optimization and job orchestration
-
NEW!
Reference architecture for a batch processing platform on GKE
-
Blog
High performance AI/ML storage through Local SSD support on GKE
-
Blog
Simplifying MLOps using Weights & Biases with Google Kubernetes Engine
-
Best practice
Best practices for running batch workloads on GKE
-
Best practice
Run cost-optimized Kubernetes applications on GKE
-
Best practice
Improving launch time of Stable Diffusion on GKE by 4x
Related resources
Related videos
Why GKE is perfect for running batch workloads
Kubernetes is the top container orchestration platform for batch workloads like data processing, machine learning, and scientific simulations. In this video, Mofi Rahman, Cloud Advocate at Google, discusses why Google Kubernetes Engine (GKE) is the
Designing Google Kubernetes clusters for massive scale and performance
Being an industry leader in running massive scale container workloads is what Google is known for. Learn from us and our customers the key best practices for designing clusters and workloads for large scale and performance on GKE. This session will