AI/ML orchestration on GKE documentation
Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. With Google Kubernetes Engine (GKE), you can implement a robust, production-ready AI/ML platform with all the benefits of managed Kubernetes and these capabilities:
- Infrastructure orchestration that supports GPUs and TPUs for training and serving workloads at scale.
- Flexible integration with distributed computing and data processing frameworks.
- Support for multiple teams on the same infrastructure to maximize utilization of resources
Documentation resources
Serve open models on GKE
-
NEW!
Serve open source models using TPUs on GKE with Optimum TPU
-
Tutorial
Serve Gemma using GPUs on GKE with Hugging Face TGI
-
Tutorial
Serve Gemma using GPUs on GKE with vLLM
-
Tutorial
Serve Gemma using GPUs on GKE with NVIDIA Triton and TensorRT-LLM
-
Tutorial
Serve Gemma using TPUs on GKE with JetStream
-
Tutorial
Quickstart: Serve a model with a single GPU in GKE Autopilot
Orchestrate TPUs and GPUs at large scale
-
Video
Introduction to Cloud TPUs for machine learning.
-
Video
Build large-scale machine learning on Cloud TPUs with GKE
-
Video
Serving Large Language Models with KubeRay on TPUs
-
Blog
Machine learning with JAX on Kubernetes with NVIDIA GPUs
-
Blog
Build a machine learning (ML) platform with Kubeflow and Ray on GKE
Cost optimization and job orchestration
-
NEW!
Reference architecture for a batch processing platform on GKE
-
Blog
High performance AI/ML storage through Local SSD support on GKE
-
Blog
Simplifying MLOps using Weights & Biases with Google Kubernetes Engine
-
Best practice
Best practices for running batch workloads on GKE
-
Best practice
Run cost-optimized Kubernetes applications on GKE
-
Best practice
Improving launch time of Stable Diffusion on GKE by 4x