Large Language Models
-
JetStream MaxText inference on v6e
A guide to set up and use JetStream with MaxText for inference on v6e.
-
JetStream PyTorch inference on v6e
A guide to set up and use JetStream with PyTorch for inference on v6e.
-
vLLM inference on v6e
A guide to set up and use vLLM for inference on v6e.
-
Serve an LLM using TPUs on GKE with vLLM
A guide to using vLLM to serve large language models (LLMs) using Tensor Processing Units (TPUs) on Google Kubernetes Engine (GKE).