This document describes how to deploy supported Hugging Face models on Vertex AI and Google Kubernetes Engine, and covers the following topics: Hugging Face provides pre-trained models, fine-tuning scripts, and development APIs that make it easier to create and discover LLMs. Model Garden can serve Text Embeddings, Text To Image, Text Generation, and Image Text To Text models from Hugging Face. The following diagram summarizes the workflow for deploying a Hugging Face model from Model Garden:
You can deploy supported Hugging Face models in Vertex AI or Google Kubernetes Engine (GKE). Your choice of deployment option depends on the model you're using and the level of control you require for your workloads. Vertex AI offers a managed platform for building and scaling machine learning projects without requiring in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Hugging Face models. Consider using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development. To deploy a supported Hugging Face model in Vertex AI: Go to Model Garden. In the Open models on Hugging Face section, click Show more. Find and select a model to deploy. Optional: For Deployment environment, select Vertex AI. Optional: Specify the deployment details. Click Deploy. To get started, see the following examples: Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. This option is recommended if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements. To deploy a supported Hugging Face model in GKE: Go to Model Garden. In the Open models on Hugging Face section, click Show more. Find and select a model to deploy. For Deployment environment, select GKE. Follow the deployment instructions. To get started, see the following examples: The latest, most popular Hugging Face models are automatically added to Model Garden. This process includes the automatic generation of a deployment configuration for each model. To address concerns about vulnerabilities and malicious code, Vertex AI uses the Hugging Face Malware Scanner to assess the safety of files within each Hugging Face model repository daily. If a model repository is flagged as containing malware, Vertex AI immediately removes the model from the Hugging Face gallery page. While a model designated as supported by Vertex AI has undergone testing and is deployable on Vertex AI, this designation doesn't guarantee the absence of vulnerabilities or malicious code. Before you deploy any model in your production environment, conduct your own security verifications. The default deployment configuration provided with the one-click deployment option might not satisfy every requirement, due to the diverse range of use cases and varying priorities for latency, throughput, cost, and accuracy. You can experiment with the one-click deployment to establish a baseline and then fine-tune the deployment configurations by using a Colab notebook or the Python SDK. This iterative approach allows you to tailor the deployment to your precise needs and achieve the best possible performance for your specific application. For more information, see the following notebooks: If a model you need isn't listed in Model Garden, it means the model is not directly supported. This section describes why a model might not be listed and what you can do. A model might not be in Model Garden for the following reasons: If a model isn't available in Model Garden, you have the following options:
Deployment options for Hugging Face models
Deployment Option
Description
Best for...
Vertex AI
A managed, serverless platform for building and scaling machine learning projects.
Teams that want end-to-end MLOps capabilities and a streamlined development experience without requiring in-house MLOps expertise.
GKE
A managed Kubernetes service that provides scalability, security, and resilience.
Organizations with existing Kubernetes investments, in-house MLOps expertise, or those needing granular control over complex AI/ML workloads.
Deploy in Vertex AI
Deploy in GKE
What "Supported by Vertex AI" means
Tune deployment configurations for specific use cases
What to do if a model isn't in Model Garden
Why a model might not be listed
text-generation
and image-text-to-text
models.text-generation
, text2text-generation
, text-to-image
, feature-extraction
, sentence-similarity
, and image-text-to-text
.What you can do
Use Hugging Face Models
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-18 UTC.