AI/ML orchestration on GKE documentation
Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. With Google Kubernetes Engine (GKE), you can implement a robust, production-ready AI/ML platform with all the benefits of managed Kubernetes and these capabilities:
- Infrastructure orchestration that supports GPUs and TPUs for training and serving workloads at scale.
- Flexible integration with distributed computing and data processing frameworks.
- Support for multiple teams on the same infrastructure to maximize utilization of resources
Start your proof of concept with $300 in free credit
- Get access to Gemini 2.0 Flash Thinking
- Free monthly usage of popular products, including AI APIs and BigQuery
- No automatic charges, no commitment
Documentation resources
Serve open models on GKE
-
NEW!
Serve LLMs like Deepseek-R1 671B or Llama 3.1 405B on GKE
-
NEW!
Serve an LLM using TPUs on GKE with KubeRay
-
Tutorial
Serve an LLM using TPU Trillium on GKE with vLLM
-
Tutorial
Quickstart: Serve an LLM using a single GPU on GKE
-
Tutorial
Serve Gemma using GPUs on GKE with Hugging Face TGI
-
Tutorial
Serve Gemma using GPUs on GKE with vLLM
Orchestrate TPUs and GPUs at large scale
-
NEW!
Optimize GKE resource utilization for mixed AI/ML training and inference workloads
-
Video
Introduction to Cloud TPUs for machine learning.
-
Video
Build large-scale machine learning on Cloud TPUs with GKE
-
Video
Serving Large Language Models with KubeRay on TPUs
-
Blog
Machine learning with JAX on Kubernetes with NVIDIA GPUs
Cost optimization and job orchestration
-
NEW!
Reference architecture for a batch processing platform on GKE
-
Blog
High performance AI/ML storage through Local SSD support on GKE
-
Blog
Simplifying MLOps using Weights & Biases with Google Kubernetes Engine
-
Best practice
Best practices for running batch workloads on GKE
-
Best practice
Run cost-optimized Kubernetes applications on GKE
-
Best practice
Improving launch time of Stable Diffusion on GKE by 4x
Related resources
Related videos
Filestore and NetApp Volumes: The future for modern and virtualized workloads
As enterprises deploy more modern applications in Google Cloud, they are presented with new ways of using and optimizing storage. We’ll showcase our newest addition, Google Cloud NetApp Volumes, a fully Google-managed file storage service built for
The future of modern enterprise applications with Google Kubernetes Engine
Google Kubernetes Engine (GKE) provides enterprises the most fully managed Kubernetes experience on the planet. Learn how Autopilot streamlines Day 2 operations around node provisioning and bin packing while providing a sensible default security
Mastering stateful workloads in GKE
Unlock the power of running stateful workloads in GKE! Dive into tools, backup solutions, and upgrade functionalities that make Kubernetes applications seamless and secure. Ready to elevate your app management? Visit the Google Cloud Console now!
Database deployment options in GKE
Connect to Cloud SQL from GKE → https://goo.gle/Connect_GKE_Cloud_SQL Deploy an app using GKE and Cloud Spanner → https://goo.gle/Connect_GKE_Cloud_Spanner This video describes your database options when deploying stateful applications on Google
What are stateful workloads?
When an application is stateful, it can be essential to have greater control over the underlying infrastructure. In this episode of GKE Essentials, Developer Advocate Kaslin Fields shares tools for running stateful workloads in GKE. Watch to learn
Scaling stateful applications on GKE with ease
Learn how customers are increasingly deploying stateful applications on Google Kubernetes Engine (GKE) to benefit from portability, economies of scale, and built-in orchestration capabilities. Also learn from MariaDB, a database software as a service
Storage best practices for data analytics, GKE, and critical applications
Learn best storage practices for data analytics, such as Cloud Storage, Filestore, Persistent Disk, Google Kubernetes Engine stateful workloads, critical applications like SAP, and high-performance computing applications. Wayfair: Creating engaging,
CSPStorage: Seamless Data Management for Kubernetes (Cloud Next '19)
Enterprise-class stateful workloads require advanced storage management support. In this session we will be discussing new storage management facilities for CSP that extend GKE data management capabilities and provide integration with GCP &
Deploying Unconventional Web Apps (Cloud Next '19)
Many modern apps diverge from the ""traditional"" web application patterns in order to support such capabilities as highly interactive experiences and mass-distributed collaboration. Whether they use new languages and frameworks, clustering, stateful
Deploy Your Next Application to Google Kubernetes Engine (Cloud Next '19)
Deploying your application on Google Kubernetes Engine sets you up for success. Whether you’re starting small with a single VM or deploying a large existing app, you can take advantage of a comprehensive set of workload primitives to run whatever you
Building Stateful Applications With Kubernetes and Cloud SQL (Cloud Next '19)
Learn how to build and scale applications, leveraging Kubernetes Engine and Cloud SQL for PostgreSQL. Kubernetes Engine enables rapid application development and iteration by making it easy to deploy, update, and manage your applications and
High Availability for Stateful GKE Workloads
Kubernetes is a great tool to host your highly available applications but what happens when you have to work with stateful workloads? Join Anthony and Mark as they showcase how to use GKE and host your applications with state! Check out this tutorial