Use Hugging Face text generation models

HuggingFace provides pre-trained models, fine-tuning scripts, and development APIs that make the process of creating and discovering LLMs easier. Model Garden supports all Text Generation Inference supported models in HuggingFace.

Deployment options

You can deploy the Text Generation Inference supported models in either Vertex AI or Google Kubernetes Engine (GKE). To deploy a Hugging Face text generation model, go to Model Garden and click Deploy from Hugging Face.

Deploy in Vertex AI

Vertex AI offers a managed platform for building and scaling machine learning projects without in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Hugging Face models. We recommend using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.

To get started, see the following examples:

Deploy in GKE

Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. We recommend this option if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements.

To get started, see the following examples: