Google models

Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your AI-based applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available in the various APIs and gives you guidance on which models to choose by use case.

For more information about all AI models and APIs on Vertex AI, see Explore AI models in Model Garden.

Gemini models

The following table summarizes the models available in the Gemini API. For more information about API details, see the Gemini API reference.

To explore a model in the Google Cloud console, select its model card in the Model Garden.

Model Inputs Outputs Use case Try the model
Gemini 1.5 Flash
Text, code, images, audio, video, video with audio, PDF Text Provides speed and efficiency for high-volume, quality, cost-effective apps. Try Gemini 1.5 Flash
Gemini 1.5 Pro
Text, code, images, audio, video, video with audio, PDF Text Supports text or chat prompts for a text or code response.
Supports long-context understanding up to the maximum input token limit.
Try Gemini 1.5 Pro
Gemini 1.0 Pro
Text Text The best performing model for a wide range of text-only tasks. Try Gemini 1.0 Pro
Gemini 1.0 Pro Vision
Text, images, audio, video, video with audio, PDF Text The best performing image and video understanding model to handle a broad range of applications. Try Gemini 1.0 Pro Vision
Gemini 1.0 Ultra
Text Text The most capable text model, optimized for complex tasks, including instruction, code, and reasoning. Try Gemini 1.0 Ultra
Gemini 1.0 Ultra Vision
Text, code, images, audio, video, video with audio, PDF Text The most capable multimodal vision model. Optimized to support joint text, images, and video inputs. Try Gemini 1.0 Ultra Vision

The following information provides details for each Gemini model.

Gemini 1.5 Flash

Description

A multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.

Capabilities

Capability Availability
Grounding Yes (text input only)
Tuning No
System instruction Yes. See Use system instructions.
JSON support Yes

Specifications

Specification
Max input tokens: 1,048,576
Max output tokens: 8,192
Max raw image size: 20 MB
Max base64 encoded image size: 7 MB
Max images per prompt: 3,000
Max video length: 1 hour
Max videos per prompt: 10
Max audio length: approximately 8.4 hours
Max audio per prompt: 1
Max PDF size: 30 MB
Training data: Up to May 2024

Model versions

For more information about model versions, see Model versions.

Stable versions

Gemini 1.5 Flash model Release date Discontinuation date
gemini-1.5-flash-001 May 24, 2024 May 24, 2025

Preview versions

Model name Preview name Discontinuation date
Gemini 1.5 Flash (Preview) gemini-1.5-flash-preview-0514 June 24, 2024

Gemini 1.5 Pro

Description

A multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. This model supports long-context understanding up to the maximum input token limit.

Capabilities

Capability Availability
Grounding Yes (text input only)
Tuning No
System instruction Yes. See Use system instructions.
JSON support Yes

Specifications

Specification
Max input tokens: 2,097,152
Max output tokens: 8,192
Max images per prompt: 3,000
Max video length (frames only): approximately one hour
Max video length (frame and audio): approximately 45 minutes
Max videos per prompt: 10
Max audio length: approximately 8.4 hours
Max audio per prompt: 1
Max PDF size: 30 MB
Training data: Up to May 2024

Model versions

For more information about model versions, see Model versions.

Stable versions

Gemini 1.5 Pro model Release date Discontinuation date
gemini-1.5-pro-001 May 24, 2024 May 24, 2025

Preview versions

Model name Model ID Discontinuation date
Gemini 1.5 Pro (Preview) gemini-1.5-pro-preview-0514 June 24, 2024
Gemini 1.5 Pro (Preview) gemini-1.5-pro-preview-0409 (points to and uses gemini-1.5-pro-preview-0514) June 14, 2024

Gemini 1.0 Pro

Description

The best performing model with features for a wide range of text-only tasks. This model supports only text as input.

Capabilities

Capability Availability
Grounding Yes (text input only)
Tuning Yes. Supervised tuning is supported by gemini-1.0-pro-002.
System instruction Yes. Supported by gemini-1.0-pro-002. See Use system instructions.
JSON support Yes

Specifications

Specification
Max input tokens: 32,760
Max output tokens: 8,192
Training data: Up to February 2023

Model versions

For more information about model versions, see Model versions.

Stable versions

Gemini 1.0 Pro model Release date Discontinuation date
gemini-1.0-pro-001 February 15, 2024 February 15, 2025
gemini-1.0-pro-002 April 9, 2024 April 9, 2025

Auto-updated versions

Model name Auto-updated name Referenced stable version
Gemini 1.0 Pro gemini-1.0-pro gemini-1.0-pro-002

Gemini 1.0 Pro Vision

Description

The best performing image and video understanding model to handle a broad range of applications. Gemini 1.0 Pro Vision supports text, image, and video as inputs.

Capabilities

Capability Availability
Grounding No
Tuning No
System instruction No
JSON support No

Specifications

Specification
Max input tokens: 16,384
Max output tokens: 2,048
Max images per prompt: 16
Max video length: 2 minutes
Max videos per prompt: 1
Training data: Up to February 2023

Model versions

For more information about model versions, see Model versions.

Stable versions

Gemini 1.0 Pro Vision model Release date Discontinuation date
gemini-1.0-pro-vision-001 February 15, 2024 February 15, 2025

Auto-updated aliases

Model name Auto-updated name Referenced stable version
Gemini 1.0 Pro Vision gemini-1.0-pro-vision gemini-1.0-pro-vision-001

Gemini 1.0 Ultra

Description

Google's most capable text model, optimized for complex tasks, including instruction, code, and reasoning. Gemini 1.0 Ultra supports only text as input.

Capabilities

Capability Availability
Grounding No
Tuning No
System instruction No
JSON support No

Specifications

Specification
Max tokens input: 8,192
Max tokens output: 2,048

Model versions

To use this model, you must contact Sales to be added to an allowlist. For more information about model versions, see Model versions.

Gemini 1.0 Ultra Vision

Description

Google's most capable multimodal vision model, optimized to support joint text, images, and video inputs.

Capabilities

Capability Availability
Grounding No
Tuning No
System instruction No
JSON support No

Specifications

Specification
Max tokens input: 8,192
Max tokens output: 2,048

Model versions

To use this model, you must contact Sales to be added to an allowlist. For more information about model versions, see Model versions.

Gemini language support

Gemini models support the following languages:

Arabic (ar), Bengali (bn), Bulgarian (bg), Chinese simplified and traditional (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (iw), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swahili (sw), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi).

Gemma models

The following table summarizes Gemma models.

Model Inputs Outputs Use case Try the model
Gemma
Model details
Text Text A small-sized, lightweight open text model supporting text generation, summarization, and extraction. Deployable in environments with limited resources. Try Gemma
CodeGemma
Model details
Text, Code, PDF Text A collection of lightweight open code models built on top of Gemma. Best for code generation and completion. Try CodeGemma
PaliGemma
Model details
Text, Images Text A lightweight vision-language model (VLM). Best for image captioning tasks and visual question and answering tasks. Try PaliGemma

Gemma language support

Gemma supports only the English language.

Embeddings models

The following table summarizes the models available in the Embeddings API.

Model name Description Specifications Try the model
Embeddings for text
(textembedding-gecko@001,
textembedding-gecko@002,
textembedding-gecko@003,
text-embedding-004
)
Model details
Returns embeddings for English text inputs.

Supports supervised tuning of Embeddings for text models, English only.
Max token input: 3,072 (textembedding-gecko@001).
Others: 2,048.

Embedding dimensions: text-embedding-004: <=768.
Others: 768.
Try Embeddings for text
Embeddings for text multilingual
(textembedding-gecko-multilingual@001,
text-multilingual-embedding-002)
Model details
Returns embeddings for text inputs of over 100 languages

Supports supervised tuning of the text-multilingual-embedding-002 model.
Supports 100 languages.
Max token input: 2,048.

Embedding dimensions: text-multilingual-embedding-002: <=768.
Others: 768.
Try Embeddings for text multilingual
Embeddings for multimodal
(multimodalembedding)
Model details
Returns embedding for text, image, and video inputs, to compare content across different models.

Converts text, image, and video into the same vector space. Video only supports 1408 dimensions.
English only
Max token input: 32.
Max image size: 20 MB.
Max video length: Two minutes.

Embedding dimensions: 128, 256, 512, or 1408 for text+image input, 1408 for video input.
Try Embeddings for multimodal

Embeddings language support

Text multilingual embedding models support the following languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Imagen model

The following table summarizes the models available in the Imagen API:

Model Inputs Outputs Use case Try the model
Imagen 2
(imagegeneration@006)
Model details
Text (generation), Images (editing) Images This model supports image generation and editing to create high quality images in seconds.

The editing feature supports object removal and insertion, outpainting, and product editing.
Try Imagen 2

Imagen 2 language support

Imagen 2 supports the following languages:
English, Chinese, Hindi, Japanese, Korean, Portuguese, and Spanish.

Code completion model

The following table summarizes the models available in the Codey APIs:

Model Inputs Outputs Use case Try the model
Codey for Code Completion
(code-gecko)
Model details
Code in supported languages Code in supported languages A model fine-tuned to suggest code completion based on the context in code that's written. Try Codey for Code Completion

Code completion model language support

The Code completion model supports the English language.

MedLM models

The following table summarizes the models available in the MedLM API:

Model name Description Specifications Try the model
MedLM-medium (medlm-medium)
Model details
A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research.

This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents. Provides better throughput and includes more recent data than the medlm-large model.
Max tokens (input + output): 32,768.
Max output tokens: 8,192.
Try MedLM-medium
MedLM-large (medlm-large)
Model details
A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research.

This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents.
Max input tokens: 8,192.
Max output tokens: 1,024.
Try MedLM-large

MedLM language support

The MedLM model supports the English language.

Model versions

To learn about model versions, see Model versions.

Explore all models in Model Garden

Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.

Go to Model Garden

To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.

What's next