Use Hugging Face Models

Hugging Face provides pre-trained models, fine-tuning scripts, and development APIs that make the process of creating and discovering LLMs easier. Model Garden can serve Text Embeddings, Text To Image, Text Generation, and Image Text To Text models in HuggingFace.

Deployment options for Hugging Face models

You can deploy supported Hugging Face models in Vertex AI or Google Kubernetes Engine (GKE). The deployment option you choose can depend on the model you're using and how much control you want over your workloads.

Deploy in Vertex AI

Vertex AI offers a managed platform for building and scaling machine learning projects without in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Hugging Face models. We recommend using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.

To deploy a supported Hugging Face model in Vertex AI, go to Model Garden.

Go to Model Garden
Go to the Open models on Hugging Face section and click Show more.
Find and select a model to deploy.
Optional: For the Deployment environment, select Vertex AI.
Optional: Specify the deployment details.
Click Deploy.

To get started, see the following examples:

Some models have detailed model cards and the deployment settings are verified by Google, such as google/gemma-3-27b-it, meta-llama/Llama-4-Scout-17B-16E-Instruct, Qwen/QwQ-32B, BAAI/bge-m3, intfloat/multilingual-e5-large-instruct, black-forest-labs/FLUX.1-dev, and HuggingFaceFW/fineweb-edu-classifier.
Some models have the deployment settings verified by Google but no detailed model cards, such as NousResearch/Genstruct-7B.
Some models have deployment settings generated automatically.
Some models have automatically generated deployment settings that are based on model metadata, such as some latest trending models in text generation, text embeddings, text to image generation, and image text to text.

Deploy in GKE

Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. We recommend this option if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements.

To deploy a supported Hugging Face model in GKE, go to Model Garden.

Go to Model Garden
Go to the Open models on Hugging Face section and click Show more.
Find and select a model to deploy.
For the Deployment environment, select GKE.
Follow the deployment instructions.

To get started, see the following examples:

Some models have detailed model cards and verified deployment settings, such as google/gemma-3-27b-it, meta-llama/Llama-4-Scout-17B-16E-Instruct, and Qwen/QwQ-32B.
Some models have verified deployment settings, but no detailed model cards, such as NousResearch/Genstruct-7B.

What does "Supported by Vertex AI" mean?

We automatically add the latest, most popular Hugging Face models to Model Garden. This process includes the automatic generation of a deployment configuration for each model.

To address concerns regarding vulnerabilities and malicious code, we use the Hugging Face Malware Scanner to assess the safety of files within each Hugging Face model repository on a daily basis. If a model repository is flagged as containing malware, we immediately remove the model from the Hugging Face gallery page.

While a model being designated as supported by Vertex AI signifies that it has undergone testing and is deployable on Vertex AI, we don't guarantee the absence of vulnerabilities or malicious code. We recommend that you conduct your own security verifications before deploying any model in your production environment.

Tune deployment configurations for specific use cases

The default deployment configuration that is provided with the one-click deployment option can't satisfy every requirement given the diverse range of use cases and varying priorities with latency, throughput, cost, and accuracy.

Therefore, you can initially experiment with the one-click deployment to establish a baseline, and then fine-tune the deployment configurations by using the Colab notebook (vLLM, TGI, TEI, HF pytorch inference) or the Python SDK. This iterative approach lets you to tailor the deployment to your precise needs to get the best possible performance for your specific application.

What should you do if the model you want isn't listed in Model Garden

If you're looking for a specific model that's not listed in Model Garden, the model is not supported by Vertex AI. The following sections describe the reasoning and what you can do.

Why isn't the model listed?

The following reasons explain why a model might not be in Model Garden:

It's not a top trending model: We often prioritize models that are widely popular and have strong community interest.
It's not yet compatible: The model might not work with a supported serving container. For example, the vLLM container for text-generation and image-text-to-text models.
Unsupported pipeline tasks: The model has a task which we don't yet fully support at the moment. We support the following tasks: text-generation, text2text-generation, text-to-image, feature-extraction, sentence-similarity, and image-text-to-text.

What are your options?

You can still work with models that available in Model Garden:

Deploy it yourself using the Colab Notebook: We have the following Colab Notebooks: (vLLM, TGI, TEI, HF pytorch inference), which provide the flexibility to deploy models with custom configurations. This gives you complete control over the process.
Submit a Feature Request: work with your support engineer and submit a feature request through the Model Garden, or refer to Vertex Generative AI support for additional help.
Keep an eye on updates: We regularly add new models to Model Garden. The model you're looking for might become available in the future, so check back periodically!