Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your AI-based applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available in the various APIs and gives you guidance on which models to choose by use case.
For more information about all AI models and APIs on Vertex AI, see Explore AI models in Model Garden.
Gemini models
The following table summarizes the models available in the Gemini API. For more information about API details, see the Gemini API reference.
To explore a model in the Google Cloud console, select its model card in the Model Garden.
Model | Inputs | Outputs | Use case | Try the model |
---|---|---|---|---|
Gemini 2.0 Flash |
Text, code, images, audio, video, video with audio, PDF | Text, audio, images | Provides next generation features, superior speed, native tool use, and multimodal generation. | Try the Gemini 2.0 Flash model |
Gemini 1.5 Flash |
Text, code, images, audio, video, video with audio, PDF | Text | Provides speed and efficiency for high-volume, quality, cost-effective apps. | Try the Gemini 1.5 Flash model |
Gemini 1.5 Pro |
Text, code, images, audio, video, video with audio, PDF | Text | Supports text or chat prompts for a text or code response. Supports long-context understanding up to the maximum input token limit. |
Try the Gemini 1.5 Pro model |
Gemini 1.0 Pro |
Text | Text | The best performing model for a wide range of text-only tasks. | Go to the Gemini 1.0 Pro model card |
Gemini 1.0 Pro Vision |
Text, images, audio, video, video with audio, PDF | Text | The best performing image and video understanding model to handle a broad range of applications. | Try the Gemini 1.0 Pro Vision model |
The following information provides details for each Gemini model.
Gemini 2.0 Flash
Description
The next generation of our Gemini Flash models. 2.0 Flash delivers superior speed to our 1.5 models and support for an expanded range of features like bidirectional streaming with our Multimodal Live API, multimodal response generation, and native tool use.
Capabilities
Capability | Availability |
---|---|
Grounding | Yes (text input only) |
Tuning | Yes |
System instruction | No. See Use system instructions. |
JSON support | Yes |
Provisioned Throughput | No. See Supported models. |
Specifications
Specification |
---|
Max input tokens: 1,048,576 |
Max output tokens: 8,192 |
Training data: Up to May 2024 |
Gemini 1.5 Flash
Description
A multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.
Capabilities
Capability | Availability |
---|---|
Grounding | Yes (text input only) |
Tuning | Yes |
System instruction | Yes. See Use system instructions. |
JSON support | Yes |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max input tokens: 1,048,576 |
Max output tokens: 8,192 |
Max raw image size: 20 MB |
Max base64 encoded image size: 7 MB |
Max images per prompt: 3,000 |
Max video length: 1 hour |
Max videos per prompt: 10 |
Max audio length: approximately 8.4 hours |
Max audio per prompt: 1 |
Max PDF size: 30 MB |
Training data: Up to May 2024 |
Model versions
For more information about model versions, see Model versions.
Stable versions
Gemini 1.5 Flash model | Release date | Discontinuation date | Model version highlights |
---|---|---|---|
gemini-1.5-flash-002 | September 24, 2024 | September 24, 2025 | Improved general model quality with significant gains in the following
categories:
Gemini 1.5 Flash 002 uses dynamic shared quota. Sometimes gemini-1.5-flash-002 can respond in your local language, even if the prompt is written in another language. This issue only applies to non-English prompts. To mitigate this issue, we recommend you add the following to your system instructions to ensure the model responds in the same language as the prompt:
|
gemini-1.5-flash-001 | May 24, 2024 | May 24, 2025 | Initial version of Gemini 1.5 Flash. |
Preview versions
Model name | Preview name | Discontinuation date |
---|---|---|
Gemini 1.5 Flash (Preview) | gemini-1.5-flash-preview-0514 |
June 24, 2024 |
Gemini 1.5 Pro
Description
A multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. This model supports long-context understanding up to the maximum input token limit.
Capabilities
Capability | Availability |
---|---|
Grounding | Yes (text input only) |
Tuning | Yes |
System instruction | Yes. See Use system instructions. |
JSON support | Yes |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max input tokens: 2,097,152 |
Max output tokens: 8,192 |
Max images per prompt: 3,000 |
Max video length (frames only): approximately one hour |
Max video length (frame and audio): approximately 45 minutes |
Max videos per prompt: 10 |
Max audio length: approximately 8.4 hours |
Max audio per prompt: 1 |
Max PDF size: 30 MB |
Training data: Up to May 2024 |
Model versions
For more information about model versions, see Model versions.
Stable versions
Gemini 1.5 Pro model | Release date | Discontinuation date | Model version highlights |
---|---|---|---|
gemini-1.5-pro-002 | September 24, 2024 | September 24, 2025 | Improved general model quality with significant gains in the following
categories:
Gemini 1.5 Pro 002 uses dynamic shared quota. Sometimes gemini-1.5-pro-002 can respond in your local language, even if the prompt is written in another language. This issue only applies to non-English prompts. To mitigate this issue, we recommend you add the following to your system instructions to ensure the model responds in the same language as the prompt:
|
gemini-1.5-pro-001 | May 24, 2024 | May 24, 2025 | Initial version of Gemini 1.5 Pro. |
Preview versions
Model name | Model ID | Discontinuation date |
---|---|---|
Gemini 1.5 Pro (Preview) | gemini-1.5-pro-preview-0514 |
June 24, 2024 |
Gemini 1.5 Pro (Preview) | gemini-1.5-pro-preview-0409 (points to and uses
gemini-1.5-pro-preview-0514 ) |
June 14, 2024 |
Gemini 1.0 Pro
Description
The best performing model with features for a wide range of text-only tasks. This model supports only text as input.
Capabilities
Capability | Availability |
---|---|
Grounding | Yes (text input only) |
Tuning | Yes. Supervised tuning is supported by gemini-1.0-pro-002. |
System instruction | Yes. Supported by gemini-1.0-pro-002. See Use system instructions. |
JSON support | Yes |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max input tokens: 32,760 |
Max output tokens: 8,192 |
Training data: Up to February 2023 |
Model versions
For more information about model versions, see Model versions.
Stable versions
Gemini 1.0 Pro model | Release date | Discontinuation date |
---|---|---|
gemini-1.0-pro-001 | February 15, 2024 | April 9, 2025 |
gemini-1.0-pro-002 | April 9, 2024 | April 9, 2025 |
Auto-updated versions
Model name | Auto-updated name | Referenced stable version |
---|---|---|
Gemini 1.0 Pro | gemini-1.0-pro |
gemini-1.0-pro-002 |
Gemini 1.0 Pro Vision
Description
The best performing image and video understanding model to handle a broad range of applications. Gemini 1.0 Pro Vision supports text, image, and video as inputs.
Capabilities
Capability | Availability |
---|---|
Grounding | No |
Tuning | No |
System instruction | No |
JSON support | No |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max input tokens: 16,384 |
Max output tokens: 2,048 |
Max images per prompt: 16 |
Max video length: 2 minutes |
Max videos per prompt: 1 |
Training data: Up to February 2023 |
Model versions
For more information about model versions, see Model versions.
Stable versions
Gemini 1.0 Pro Vision model | Release date | Discontinuation date |
---|---|---|
gemini-1.0-pro-vision-001 | February 15, 2024 | April 9, 2025 |
Auto-updated aliases
Model name | Auto-updated name | Referenced stable version |
---|---|---|
Gemini 1.0 Pro Vision | gemini-1.0-pro-vision |
gemini-1.0-pro-vision-001 |
Gemini 1.0 Ultra
Description
Google's most capable text model, optimized for complex tasks, including instruction, code, and reasoning. Gemini 1.0 Ultra supports only text as input.
Capabilities
Capability | Availability |
---|---|
Grounding | No |
Tuning | No |
System instruction | No |
JSON support | No |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max tokens input: 8,192 |
Max tokens output: 2,048 |
Model versions
For more information about model versions, see Model versions.
Gemini 1.0 Ultra Vision
Description
Google's most capable multimodal vision model, optimized to support joint text, images, and video inputs.
Capabilities
Capability | Availability |
---|---|
Grounding | No |
Tuning | No |
System instruction | No |
JSON support | No |
Provisioned Throughput | Yes. See Supported models. |
Specifications
Specification |
---|
Max tokens input: 8,192 |
Max tokens output: 2,048 |
Model versions
For more information about model versions, see Model versions.
Gemini language support
All the Gemini models can understand and respond in the following languages:
Arabic (ar), Bengali (bn), Bulgarian (bg), Chinese simplified and traditional (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (iw), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swahili (sw), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Gemini 1.5 Pro and Gemini 1.5 Flash models can understand and respond in the following additional languages:
Afrikaans (af), Amharic (am), Assamese (as), Azerbaijani (az), Belarusian (be), Bosnian (bs), Catalan (ca), Cebuano (ceb), Corsican (co), Welsh (cy), Dhivehi (dv), Esperanto (eo), Basque (eu), Persian (fa), Filipino (Tagalog) (fil), Frisian (fy), Irish (ga), Scots Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hawaiian (haw), Hmong (hmn), Haitian Creole (ht), Armenian (hy), Igbo (ig), Icelandic (is), Javanese (jv), Georgian (ka), Kazakh (kk), Khmer (km), Kannada (kn), Krio (kri), Kurdish (ku), Kyrgyz (ky), Latin (la), Luxembourgish (lb), Lao (lo), Malagasy (mg), Maori (mi), Macedonian (mk), Malayalam (ml), Mongolian (mn), Meiteilon (Manipuri) (mni-Mtei), Marathi (mr), Malay (ms), Maltese (mt), Myanmar (Burmese) (my), Nepali (ne), Nyanja (Chichewa) (ny), Odia (Oriya) (or), Punjabi (pa), Pashto (ps), Sindhi (sd), Sinhala (Sinhalese) (si), Samoan (sm), Shona (sn), Somali (so), Albanian (sq), Sesotho (st), Sundanese (su), Tamil (ta), Telugu (te), Tajik (tg), Uyghur (ug), Urdu (ur), Uzbek (uz), Xhosa (xh), Yiddish (yi), Yoruba (yo), Zulu (zu)
Gemma models
The following table summarizes Gemma models.
Model | Inputs | Outputs | Use case | Try the model |
---|---|---|---|---|
Gemma Model details |
Text | Text | A small-sized, lightweight open text model supporting text generation, summarization, and extraction. Deployable in environments with limited resources. | Try Gemma |
CodeGemma Model details |
Text, Code, PDF | Text | A collection of lightweight open code models built on top of Gemma. Best for code generation and completion. | Try CodeGemma |
PaliGemma Model details |
Text, Images | Text | A lightweight vision-language model (VLM). Best for image captioning tasks and visual question and answering tasks. | Try PaliGemma |
Gemma language support
Gemma supports only the English language.
Embeddings models
The following table summarizes the models available in the Embeddings API.
Model name | Description | Specifications | Try the model |
---|---|---|---|
Embeddings for text ( textembedding-gecko@001, )Model details |
Returns embeddings for English text inputs.
Supports supervised tuning of Embeddings for text models, English only. |
Max token input: 3,072 (textembedding-gecko@001 ).Others: 2,048. Embedding dimensions: text-embedding-004 : <=768. Others: 768. |
Try Embeddings for text |
Embeddings for text multilingual ( textembedding-gecko-multilingual@001 ,
text-multilingual-embedding-002 )Model details |
Returns embeddings for text inputs of over 100 languages
Supports supervised tuning of the text-multilingual-embedding-002 model. Supports 100 languages. |
Max token input: 2,048. Embedding dimensions: text-multilingual-embedding-002 : <=768. Others: 768. |
Try Embeddings for text multilingual |
Embeddings for multimodal(multimodalembedding) Model details |
Returns embedding for text, image, and video inputs, to compare content across different models. Converts text, image, and video into the same vector space. Video only supports 1408 dimensions. English only |
Max token input: 32. Max image size: 20 MB. Max video length: Two minutes. Embedding dimensions: 128, 256, 512, or 1408 for text+image input, 1408 for video input. |
Try Embeddings for multimodal |
Embeddings language support
Text multilingual embedding models support the following languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque,
Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese,
Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino,
Finnish, French, Galician, Georgian, German, Greek, Gujarati,
Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian,
Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada,
Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian,
Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori,
Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish,
Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic,
Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho,
Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai,
Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian,
Xhosa, Yiddish, Yoruba, Zulu.
Imagen model
The following table summarizes the models available in the Imagen API:
Model | Inputs | Outputs | Use case | Try the model |
---|---|---|---|---|
Imagen ( imagen-3.0-generate-001 ,
imagen-3.0-fast-generate-001 ,
imagen-3.0-capability-001 ,
imagegeneration@006 ,
imagegeneration@005 ,
imagegeneration@002 )Model details |
Text (generation), Images (editing) | Images | This model supports image generation and editing to create high
quality images in seconds. This includes image generation using
either zero-shot or few-shot learning.
The editing feature supports object removal and insertion, outpainting, and product editing. |
Try Imagen |
Imagen 3 language support
Imagen 3 supports the following languages:
English, Chinese, Hindi, Japanese, Korean, Portuguese, and Spanish.
Code completion model
The following table summarizes the models available in the Codey APIs:
Model | Inputs | Outputs | Use case | Try the model |
---|---|---|---|---|
Codey for Code Completion ( code-gecko ) Model details |
Code in supported languages | Code in supported languages | A model fine-tuned to suggest code completion based on the context in code that's written. | Try Codey for Code Completion |
Code completion model language support
The Code completion model supports the English language.
MedLM models
The following table summarizes the models available in the MedLM API:
Model name | Description | Specifications | Try the model |
---|---|---|---|
MedLM-medium (medlm-medium )Model details |
A HIPAA-compliant suite of medically tuned models and APIs powered
by Google Research. This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents. Provides better throughput and includes more recent data than the medlm-large model. |
Max tokens (input + output): 32,768. Max output tokens: 8,192. |
Try MedLM-medium |
MedLM-large (medlm-large )Model details |
A HIPAA-compliant suite of medically tuned models and APIs powered
by Google Research. This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents. |
Max input tokens: 8,192. Max output tokens: 1,024. |
Try MedLM-large |
MedLM Provisioned Throughput support
MedLM-medium and MedLM-large support Provisioned Throughput. See Supported models.
MedLM language support
The MedLM model supports the English language.
Locations
For a list of locations where these models are available, see Generative AI on Vertex AI locations.
Model versions
To learn about model versions, see Model versions.
Explore all models in Model Garden
Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.
To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.
What's next
- Try a quickstart tutorial using Vertex AI Studio or the Vertex AI API.
- Learn how to test text prompts.
- Learn how to test chat prompts.
- Explore pretrained models in Model Garden.
- Learn how to tune a foundation model.
- Learn about responsible AI best practices and Vertex AI's safety filters.
- Learn how to control access to specific models in Model Garden by using a Model Garden organization policy.