Model Garden in the Google Cloud console is an ML model library that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets.
The following topics introduce you to the AI models available in Model Garden and how to use them.
Explore models
To view the list of available Vertex AI and open source foundation, tunable, and task-specific models, go to the Model Garden page in the Google Cloud console.
The model categories available in Model Garden are:
Category | Description |
---|---|
Foundation models | Pretrained multitask large models that can be tuned or customized for specific tasks using Vertex AI Studio, Vertex AI API, and the Vertex AI SDK for Python. |
Fine-tunable models | Models that you can fine-tune using a custom notebook or pipeline. |
Task-specific solutions | Most of these prebuilt models are ready to use. Many can be customized using your own data. |
To filter models in the filter pane, specify the following:
- Modalities: Click the modalities (data types) that you want in the model.
- Tasks: Click the task that you want the model to perform.
- Features: Click the features that you want in the model.
To learn more about each model, click its model card.
Models available in Model Garden
You can find Google's first-party models and select open source models in Model Garden.
List of Google's first-party models
The following table lists the Google's first-party models that are available in Model Garden:
Model name | Modality | Description | Quickstarts |
---|---|---|---|
Gemini 1.5 Flash | Language, audio, vision | The fastest, most cost-effective Gemini multimodal model. It's built for high volume tasks and latency-sensitive, affordable applications. Because of how responsive Gemini 1.5 Flash is, it's a good option to create chat assistants and on-demand content generation applications. | Model card |
Gemini 1.5 Pro | Language, audio, vision | Multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. | Model card |
Gemini 1.0 Pro | Language | Designed to handle natural language tasks, multiturn text and code chat, and code generation. | Model card |
Gemini 1.0 Pro Vision | Language, vision | Multimodal model that supports adding image, video, and PDF files in text or chat prompts for a text or code response. | Model card |
PaLM 2 for Text | Language | Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks. | Model card |
PaLM 2 for Chat | Language | Fine-tuned to conduct natural conversation. Use this model to build and customize your own chatbot application. | Model card |
Codey for Code Completion | Language | Generates code based on code prompts. Good for code suggestions and minimizing bugs in code. | Model card |
Codey for Code Generation | Language | Generates code based on natural language input. Good for writing functions, classes, unit tests, and more. | Model card |
Codey for Code Chat | Language | Get code-related assistance through natural conversation. Good for questions about an API, syntax in a supported language, and more. | Model card |
Embeddings for Text | Language | Converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. | Model card |
Imagen for Image Generation | Vision | Create or edit studio-grade images at scale using text prompts. | Model card |
Vertex Image Segmentation (Preview) | Vision | Use text prompts or draw scribbles to segment an image. Image segmentation lets you, for example, detect objects, remove the background of an image, or segment the foreground of an image. | Model card |
Imagen for Captioning & VQA | Language | Generates a relevant description for a given image. | Model card |
Embeddings for Multimodal | Vision | Generates vectors based on images, which can be used for downstream tasks like image classification and image search. | Model card |
Chirp | Speech | A version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. | Model card |
List of models with open source tuning or serving recipes in Model Garden
The following table lists the OSS models that support open source tuning or serving recipes in Model Garden:
Model name | Modality | Description | Quickstart |
---|---|---|---|
Flux | Vision | A 12 billion parameter rectified flow transformer model that generates high-quality images from text descriptions. | Model card |
Prompt Guard | Language | Guardrail LLM inputs against jailbreaking techniques and indirect injections. | Model card |
Llama 3.2 | Language | A collection of multilingual large language models that are pretrained and instruction-tuned generative models in 1B and 3B sizes. | Model card |
Llama 3.2-Vision | Language, Vision | A collection of multimodal large language models that are pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes. These models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. | Model card |
Llama Guard 3 | Language | A Llama-3.1-8B pretrained model that has been fine-tuned for content safety classification. | Model card |
Qwen2 | Language | Deploy Qwen2, a foundation large language model series. | Colab Model card |
Phi-3 | Language | Deploy Phi-3, a foundation large language model series. | Colab Model card |
E5 | Language | Deploy E5, a text embedding model series. | Colab Model card |
Instant ID | Language, Vision | Deploy Instant ID, an identity preserving, text-to-image generation model. | Colab Model card |
Llama 3 | Language | Explore and build with Meta's Llama 3 models (8B, 70B, 405B) on Vertex AI. | Model card |
Gemma 2 | Language | Open weight models (9B, 27B) that are built from the same research and technology used to create Google's Gemini models. | Model card |
Gemma | Language | Open weight models (2B, 7B) that are built from the same research and technology used to create Google's Gemini models. | Model card |
CodeGemma | Language | Open weight models (2B, 7B) designed for code generation and code completion that are built from the same research and technology used to create Google's Gemini models. | Model card |
PaliGemma | Language | Open weight 3B model designed for image captioning tasks and visual question and answering tasks that's built from the same research and technology used to create Google's Gemini models. | Model card |
Vicuna v1.5 | Language | Deploy Vicuna v1.5 series models, which are foundation models fine-tuned from LLama2 for text generation. | Model card |
NLLB | Language | Deploy nllb series models for multi-language translation. | Model card Colab |
Mistral-7B | Language | Deploy Mistral-7B, a foundational model for text generation. | Model card |
BioGPT | Language | Deploy BioGPT, a text generative model for the biomedical domain. | Model card Colab |
BiomedCLIP | Language, Vision | Deploy BiomedCLIP, a multimodal foundation model for the biomedical domain. | Model card Colab |
ImageBind | Language, Vision, Audio |
Deploy ImageBind, a foundational model for multimodal embedding. | Model card Colab |
DITO | Language, Vision | Finetune and deploy DITO, a multimodal foundation model for open vocabulary object detection tasks. | Model card Colab |
OWL-ViT v2 | Language, Vision | Deploy OWL-ViT v2, a multimodal foundation model for open vocabulary object detection tasks. | Model card Colab |
FaceStylizer (Mediapipe) | Vision | A generative pipeline to transform human face images to a new style. | Model card Colab |
Llama 2 | Language | Finetune and deploy Meta's Llama 2 foundation models (7B, 13B, 70B) on Vertex AI. | Model card |
Code Llama | Language | Deploy Meta's Code Llama foundation models (7B, 13B, 34B) on Vertex AI. | Model card |
Falcon-instruct | Language | Finetune and deploy Falcon-instruct models (7B, 40B) by using PEFT. | Colab Model card |
OpenLLaMA | Language | Finetune and deploy OpenLLaMA models (3B, 7B, 13B) by using PEFT. | Colab Model card |
T5-FLAN | Language | Finetune and deploy T5-FLAN (base, small, large). | Model card (fine-tuning pipeline included) |
BERT | Language | Finetune and deploy BERT by using PEFT. | Colab Model card |
BART-large-cnn | Language | Deploy BART, a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. | Colab Model card |
RoBERTa-large | Language | Finetune and deploy RoBERTa-large by using PEFT. | Colab Model card |
XLM-RoBERTa-large | Language | Finetune and deploy XLM-RoBERTa-large (a multilingual version of RoBERTa) by using PEFT. | Colab Model card |
Dolly-v2-7b | Language | Deploy Dolly-v2-7b, an instruction-following large language model with 6.9 billion parameters. | Colab Model card |
Stable Diffusion XL v1.0 | Language, Vision | Deploy Stable Diffusion XL v1.0, which supports text-to-image generation. | Colab Model card |
Stable Diffusion XL Lightning | Language, Vision | Deploy Stable Diffusion XL Lightning, a text-to-image generation model. | Colab Model card |
Stable Diffusion v2.1 | Language, Vision | Finetune and deploy Stable Diffusion v2.1 (supports text-to-image generation) by using Dreambooth. | Colab Model card |
Stable Diffusion 4x upscaler | Language, Vision | Deploy Stable Diffusion 4x upscaler, which supports text conditioned image superresolution. | Colab Model card |
InstructPix2Pix | Language, Vision | Deploy InstructPix2Pix, which supports image editing by using a text prompt. | Colab Model card |
Stable Diffusion Inpainting | Language, Vision | Finetune and deploy Stable Diffusion Inpainting, which supports inpainting a masked image by using a text prompt. | Colab Model card |
SAM | Language, Vision | Deploy Segment Anything, which supports zero-shot image segmentation. | Colab Model card |
Text-to-video (ModelScope) | Language, Vision | Deploy ModelScope text-to-video, which supports text-to-video generation. | Colab Model card |
Pic2Word Composed Image Retrieval | Language, Vision | Deploy Pic2Word, which supports multi-modal composed image retrieval. | Colab Model card |
BLIP2 | Language, Vision | Deploy BLIP2, which supports image captioning and visual-question-answering. | Colab Model card |
Open-CLIP | Language, Vision | Finetune and deploy the Open-CLIP, which supports zero-shot classification. | Colab Model card |
F-VLM | Language, Vision | Deploy F-VLM, which supports open vocabulary image object detection. | Colab Model card |
tfhub/EfficientNetV2 | Vision | Finetune and deploy the Tensorflow Vision implementation of the EfficientNetV2 image classification model. | Colab Model card |
EfficientNetV2 (TIMM) | Vision | Finetune and deploy the PyTorch implementation of the EfficientNetV2 image classification model. | Colab Model card |
Proprietary/EfficientNetV2 | Vision | Finetune and deploy the Google proprietary checkpoint of the EfficientNetV2 image classification model. | Colab Model card |
EfficientNetLite (MediaPipe) | Vision | Finetune EfficientNetLite image classification model through MediaPipe model maker. | Colab Model card |
tfvision/vit | Vision | Finetune and deploy the TensorFlow Vision implementation of the ViT image classification model. | Colab Model card |
ViT (TIMM) | Vision | Finetune and deploy the PyTorch implementation of the ViT image classification model. | Colab Model card |
Proprietary/ViT | Vision | Finetune and deploy the Google proprietary checkpoint of the ViT image classification model. | Colab Model card |
Proprietary/MaxViT | Vision | Finetune and deploy the Google proprietary checkpoint of the MaxViT hybrid (CNN + ViT) image classification model. | Colab Model card |
ViT (JAX) | Vision | Finetune and deploy the JAX implementation of the ViT image classification model. | Colab Model card |
tfvision/SpineNet | Vision | Finetune and deploy the Tensorflow Vision implementation of the SpineNet object detection model. | Colab Model card |
Proprietary/Spinenet | Vision | Finetune and deploy the Google proprietary checkpoint of the SpineNet object detection model. | Colab Model card |
tfvision/YOLO | Vision | Finetune and deploy the TensorFlow Vision implementation of the YOLO one-stage object detection model. | Colab Model card |
Proprietary/YOLO | Vision | Finetune and deploy the Google proprietary checkpoint of the YOLO one-stage object detection model. | Colab Model card |
YOLOv8 (Keras) | Vision | Finetune and deploy the Keras implementation of the YOLOv8 model for object detection. | Colab Model card |
tfvision/YOLOv7 | Vision | Finetune and deploy YOLOv7 model for object detection. | Colab Model card |
ByteTrack Video Object Tracking | Vision | Run batch prediction for video object tracking by using ByteTrack tracker. | Colab Model card |
ResNeSt (TIMM) | Vision | Finetune and deploy the PyTorch implementation of the ResNeSt image classification model. | Colab Model card |
ConvNeXt (TIMM) | Vision | Finetune and deploy ConvNeXt, a pure convolutional model for image classification inspired by the design of Vision Transformers. | Colab Model card |
CspNet (TIMM) | Vision | Finetune and deploy the CSPNet (Cross Stage Partial Network) image classification model. | Colab Model card |
Inception (TIMM) | Vision | Finetune and deploy the Inception image classification model. | Colab Model card |
DeepLabv3+ (with checkpoint) | Vision | Finetune and deploy the DeepLab-v3 Plus model for semantic image segmentation. | Colab Model card |
Faster R-CNN (Detectron2) | Vision | Finetune and deploy the Detectron2 implementation of the Faster R-CNN model for image object detection. | Colab Model card |
RetinaNet (Detectron2) | Vision | Finetune and deploy the Detectron2 implementation of the RetinaNet model for image object detection. | Colab Model card |
Mask R-CNN (Detectron2) | Vision | Finetune and deploy the Detectron2 implementation of the Mask R-CNN model for image object detection and segmentation. | Colab Model card |
ControlNet | Vision | Finetune and deploy the ControlNet text-to-image generation model. | Colab Model card |
MobileNet (TIMM) | Vision | Finetune and deploy the PyTorch implementation of the MobileNet image classification model. | Colab Model card |
MobileNetV2 (MediaPipe) Image Classification | Vision | Finetune the MobileNetV2 image classification model by using MediaPipe model maker. | Colab Model card |
MobileNetV2 (MediaPipe) Object Detection | Vision | Finetune the MobileNetV2 object detection model by using MediaPipe model maker. | Colab Model card |
MobileNet-MultiHW-AVG (MediaPipe) | Vision | Finetune the MobileNet-MultiHW-AVG object detection model by using MediaPipe model maker. | Colab Model card |
DeiT | Vision | Finetune and deploy the DeiT (Data-efficient Image Transformers) model for image classification. | Colab Model card |
BEiT | Vision | Finetune and deploy the BEiT (Bidirectional Encoder representation from Image Transformers) model for image classification. | Colab Model card |
Hand Gesture Recognition (MediaPipe) | Vision | Finetune and deploy on-device the Hand Gesture Recognition models by using MediaPipe. | Colab Model card |
Average Word Embedding Classifier (MediaPipe) | Vision | Finetune and deploy on-device the Average Word Embedding Classifier models by using MediaPipe. | Colab Model card |
MobileBERT Classifier (MediaPipe) | Vision | Finetune and deploy on-device the MobileBERT Classifier models by using MediaPipe. | Colab Model card |
MoViNet Video Clip Classification | Video | Finetune and deploy MoViNet video clip classification models. | Colab Model card |
MoViNet Video Action Recognition | Video | Finetune and deploy MoViNet models for action recognition inference. | Colab Model card |
Stable Diffusion XL LCM | Vision | Deploy this model which uses the Latent Consistency Model (LCM) to enhance text-to-image generation in Latent Diffusion Models by enabling faster and high-quality image creation with fewer steps. | Colab Model card |
LLaVA 1.5 | Vision, Language | Deploy LLaVA 1.5 models. | Colab Model card |
Pytorch-ZipNeRF | Vision, Video | Train the Pytorch-ZipNeRF model which is a state-of-the-art implementation of the ZipNeRF algorithm in the Pytorch framework, designed for efficient and accurate 3D reconstruction from 2D images. | Colab Model card |
Mixtral | Language | Deploy the Mixtral model which is a Mixture of Experts (MoE) large language model (LLM) developed by Mistral AI. | Model card |
Llama 2 (Quantized) | Language | Fine-tune & deploy a quantized version of Meta's Llama 2 models. | Colab Model card |
LaMa (Large Mask Inpainting) | Vision | Deploy LaMa which uses fast Fourier convolutions (FFCs), a high receptive field perceptual loss and large training masks allows for resolution-robust image inpainting. | Colab Model card |
AutoGluon | Tabular | With AutoGluon you can train and deploy high-accuracy machine learning and deep learning models for tabular data. | Colab Model card |
MaMMUT | Language, Vision | A vision-encoder and text-decoder architecture for multimodal tasks such as visual question answering, image-text retrieval, text-image retrieval, and generation of multimodal embeddings. | Colab Model card |
List of partner models available in Model Garden
Some partner models are offered as managed APIs on Vertex AI Model Garden (also known as model as a service). The following table lists the models that are available from Google partners in Model Garden:
Model name | Modality | Description | Quickstart |
---|---|---|---|
Anthropic's Claude 3.5 Sonnet v2 | Language | The upgraded Claude 3.5 Sonnet is a state-of-the-art model for real-world software engineering tasks and agentic capabilities. Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor. | Model card |
Anthropic's Claude 3.5 Haiku | Language | Claude 3.5 Haiku, the next generation of Anthropic's fastest and most cost-effective model, is optimal for use cases where speed and affordability matter. | Model card |
Anthropic's Claude 3 Opus | Language | A powerful AI model, with top-level performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. | Model card |
Anthropic's Claude 3 Haiku | Language | Anthropic's fastest vision and text model for near-instant responses to simple queries, meant for seamless AI experiences mimicking human interactions. | Model card |
Anthropic's Claude 3.5 Sonnet | Language | Claude 3.5 Sonnet outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet. | Model card |
Anthropic's Claude 3 Sonnet | Language | A vision and text model that balances performance and speed to process enterprise workloads. It's engineered for a low cost, scaled AI deployments. | Model card |
Jamba 1.5 Large (Preview) | Language | AI21 Labs's Jamba 1.5 Large is designed for superior quality responses, high throughput, and competitive pricing compared to other models in its size class. | Model card |
Jamba 1.5 Mini (Preview) | Language | AI21 Labs's Jamba 1.5 Mini is well balanced across quality, throughput, and low cost. | Model card |
Llama 3.2 (Preview) | Language, Vision | A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis as well as image captioning. | Model card |
Llama 3.1 (Preview) | Language | A collection of multilingual LLMs optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. | Model card |
Mistral Large (24.11) | Language | Mistral Large (24.11) is the next version of the Mistral Large (2407) model now with improved reasoning and function calling capabilities. | Model card |
Mistral Large (2407) | Language | Mistral Large (2407) is Mistral AI's flagship model for text generation. It reaches top-tier reasoning capabilities and can be used for complex multilingual tasks, including text understanding, transformation, and code generation. | Model card |
Mistral Nemo | Language | Mistral AI's most cost efficient proprietary model. Use Mistral Nemo low-latency workloads and simple tasks that can be done in bulk, such as classification, customer support, and text generation. | Model card |
Codestral | Code | A generative model that is specifically designed and optimized for code generation. You can use Codestral to design advanced AI applications. | Model card |
Model testing and security
Google does thorough testing and benchmarking on the serving and tuning containers that we provide. Active vulnerability scanning is also applied to container artifacts.
Third-party models from featured partners undergo model checkpoint scans to ensure authenticity. Third-party models from HuggingFace Hub are scanned directly by HuggingFace, at each commit or when a repository page is visited, for malware, pickle files, and secrets. Models with findings from these scans are flagged by HuggingFace. We recommend you perform a thorough review of any flagged model before deploying it within Model Garden.
How to use model cards
Click a model card to use the model associated with it. For example, you can click a model card to test prompts, tune a model, create applications, and view code samples.
To learn how to use models associated with model cards, click one of the following tabs:
Test prompts
Use the Vertex AI PaLM API model card to test prompts.
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to test and click View details.
Click Open prompt design.
You're taken to the Prompt design page.
In Prompt, enter the prompt that you want to test.
Optional: Configure the model parameters.
Click Submit.
Tune a model
To tune supported models, use a Vertex AI pipeline or a notebook.
Tune using a pipeline
The BERT and T5-FLAN models support model tuning using a pipeline.
In the Google Cloud console, go to the Model Garden page.
In Search models, enter BERT or T5-FLAN, then click the magnifying glass to search.
Click View details on the T5-FLAN or the BERT model card.
Click Open fine-tuning pipeline.
You're taken to the Vertex AI pipelines page.
To start tuning, click Create run.
Tune in a notebook
The model cards for most open source foundation models and fine-tunable models support tuning in a notebook.
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to tune and click View details.
Click Open notebook.
Deploy a model
You can deploy a model from its model card, such as Stable Diffusion. When deploying a model, you can choose to use a Compute Engine reservation. For more information, see Use reservations with prediction.
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to deploy, and click its model card.
Click Deploy to open the Deploy model pane.
In the Deploy model pane, specify details for your deployment.
- Use or modify the generated model and endpoint names.
- Select a location to create your model endpoint in.
- Select a machine type to use for each node of your deployment.
To use a Compute Engine reservation, under the Deployment settings section, select Advanced.
For the Reservation type field, select a reservation type. The reservation must match your specified machine specs.
- Automatically use created reservation: Vertex AI automatically selects an allowed reservation with matching properties. If there's no capacity in the automatically selected reservation, Vertex AI uses the general Google Cloud resource pool.
- Select specific reservations: Vertex AI uses a specific reservation. If there's no capacity for your selected reservation, an error is thrown.
- Don't use (default): Vertex AI uses the general Google Cloud resource pool. This value has the same effect as not specifying a reservation.
Click Deploy.
View code samples
Most of the model cards for task-specific solutions models contain code samples that you can copy and test.
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to view code samples for and click the Documentation tab.
The page scrolls to the documentation section with sample code embedded in place.
Create a vision app
The model cards for applicable computer vision models support creating a vision application.
In the Google Cloud console, go to the Model Garden page.
Find a vision model in the Task specific solutions section that you want to use to create a vision application and click View details.
Click Build app.
You're taken to Vertex AI Vision.
In Application name, enter a name for your application and click Continue.
Select a billing plan and click Create.
You're taken to Vertex AI Vision Studio where you can continue creating your computer vision application.
Pricing
For the open source models in Model Garden, you are charged for use of following on Vertex AI:
- Model tuning: You are charged for the compute resources used at the same rate as custom training. See custom training pricing.
- Model deployment: You are charged for the compute resources used to deploy the model to an endpoint. See predictions pricing.
- Colab Enterprise: See Colab Enterprise pricing.
Control access to specific models
You can set a Model Garden organization policy at the organization, folder, or project level to control access to specific models in Model Garden. For example, you can allow access to specific models that you've vetted and deny access to all others.
What's next
- Learn about responsible AI best practices and Vertex AI's safety filters.
- Learn about Generative AI on Vertex AI.
- Learn how to tune foundation models.