Google models

Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your AI-based applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available in the various APIs and gives you guidance on which models to choose by use case.

For more information about all AI models and APIs on Vertex AI, see Explore AI models in Model Garden.

Gemini models

The following table summarizes the models available in the Gemini API. For more information about API details, see the Gemini API reference.

To explore a model in the Google Cloud console, select its model card in the Model Garden.

Model	Inputs	Outputs	Use case	Try the model
Gemini 2.0 Flash `gemini-2.0-flash-001`	Text, Code, Images, Audio, Video, Video with Audio, PDF	Text, Audio (private preview), Images (private preview)	Workhorse model for all daily tasks. Strong overall performance and supports real-time streaming Live API.	Try Gemini 2.0 Flash
Gemini 2.0 Pro `gemini-2.0-pro-exp-02-05`	Text, Images, Video, Audio, PDF	Text	Strongest model quality, especially for code & world knowledge; 2M long context.	Try Gemini 2.0 Pro
Gemini 2.0 Flash-Lite `gemini-2.0-flash-lite-preview-02-05`	Text, Images, Video, Audio, PDF	Text	Our cost effective offering to support high throughput.	Try Gemini 2.0 Flash-Lite
Gemini 2.0 Flash Thinking `gemini-2.0-flash-thinking-exp-01-21`	Text, Images	Text	Provides stronger reasoning capabilities and includes the thinking process in responses.	Try Gemini 2.0 Flash Thinking
Gemini 1.5 Flash `gemini-1.5-flash`	Text, Code, Images, Audio, Video, Video with Audio, PDF	Text	Provides speed and efficiency for high-volume, quality, cost-effective apps.	Try Gemini 1.5 Flash
Gemini 1.5 Pro `gemini-1.5-pro`	Text, Code, Images, Audio, Video, Video with Audio, PDF	Text	Supports text or chat prompts for a text or code response. Supports long-context understanding up to the maximum input token limit.	Try Gemini 1.5 Pro
Gemini 1.0 Pro `gemini-1.0-pro`	Text	Text	The best performing model for a wide range of text-only tasks.	Try Gemini 1.0 Pro
Gemini 1.0 Pro Vision `gemini-1.0-pro-vision`	Text, Images, Audio, Video, Video with Audio, PDF	Text	The best performing image and video understanding model to handle a broad range of applications.	Try Gemini 1.0 Pro Vision

The following information provides details for each Gemini model.

Gemini 2.0 Flash

The next generation of our Gemini Flash models. Gemini 2.0 Flash delivers superior speed to our 1.5 models and support for an expanded range of features like bidirectional streaming with our Multimodal Live API, multimodal response generation, and built-in tool use.

Capabilities

Capability	Availability
Grounding with Google Search
Code execution
Tuning
System instruction	See Use system instructions.
Controlled Generation
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	1,048,576
Max output tokens	8,192
Training data	Up to June 2024

Gemini 2.0 Pro

Gemini 2.0 Pro is our strongest model for coding and world knowledge and features a 2M long context window. Gemini 2.0 Pro is available as an experimental model in Vertex AI and is an upgrade path for 1.5 Pro users who want better quality, or who are particularly invested in long context and code.

Capabilities

Capability	Availability
Grounding with Google Search
Code execution
Tuning
System instruction	See Use system instructions.
JSON support
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	2,097,152
Max output tokens	8,192
Training data	Up to June 2024

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.

Capabilities

Capability	Availability
Grounding with Google Search
Code execution
Tuning
System instruction	See Use system instructions.
JSON support
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	1,048,576
Max output tokens	8,192
Training data	Up to June 2024

Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is an experimental test-time compute model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Flash Thinking is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model. For more information, see the Gemini 2.0 Flash Thinking documentation

Capabilities

Capability	Availability
Grounding
Tuning
System instruction	See Use system instructions.
JSON support
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	1,048,576
Max output tokens	65,536
Training data	Up to May 2024

Gemini 1.5 Flash

A multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.

Capabilities

Capability	Availability
Grounding	Text input only
Tuning
System instruction	See Use system instructions.
Controlled Generation
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	1,048,576
Max output tokens	8,192
Max raw image size	20 MB
Max base64 encoded image size	7 MB
Max images per prompt	3,000
Max video length	1 hour
Max videos per prompt	10
Max audio length	approximately 8.4 hours
Max audio per prompt	1
Max PDF size	30 MB
Training data	Up to May 2024

Gemini 1.5 Pro

A multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. This model supports long-context understanding up to the maximum input token limit.

Capabilities

Capability	Availability
Grounding	Yes (text input only)
Tuning
System instruction	Yes. See Use system instructions.
JSON support
Provisioned Throughput	Yes. See Supported models.

Specifications

Specification	Value
Max input tokens	2,097,152
Max output tokens	8,192
Max images per prompt	3,000
Max video length (frames only)	approximately one hour
Max video length (frame and audio)	approximately 45 minutes
Max videos per prompt	10
Max audio length	approximately 8.4 hours
Max audio per prompt	1
Max PDF size	30 MB
Training data	Up to May 2024

Gemini 1.0 Pro

The best performing model with features for a wide range of text-only tasks. This model supports only text as input.

Capabilities

Capability	Availability
Grounding	Yes (text input only)
Tuning	Yes. Supervised tuning is supported by gemini-1.0-pro-002.
System instruction	Yes. Supported by gemini-1.0-pro-002. See Use system instructions.
JSON support
Provisioned Throughput	Yes. See Supported models.

Specifications

Specification	Value
Max input tokens	32,760
Max output tokens	8,192
Training data	Up to February 2023

Gemini 1.0 Pro Vision

The best performing image and video understanding model to handle a broad range of applications. Gemini 1.0 Pro Vision supports text, image, and video as inputs.

Capabilities

Capability	Availability
Grounding
Tuning
System instruction
JSON support
Provisioned Throughput	Yes. See Supported models.

Specifications

Specification	Value
Max input tokens	16,384
Max output tokens	2,048
Max images per prompt	16
Max video length	2 minutes
Max videos per prompt	1
Training data	Up to February 2023

Gemini 1.0 Ultra

Google's most capable text model, optimized for complex tasks, including instruction, code, and reasoning. Gemini 1.0 Ultra supports only text as input.

Capabilities

Capability	Availability
Grounding
Tuning
System instruction
JSON support
Provisioned Throughput	Yes. See Supported models.

Specifications

Specification	Value
Max input tokens	8,192
Max output tokens	2,048

Gemini 1.0 Ultra Vision

Google's most capable multimodal vision model, optimized to support joint text, images, and video inputs.

Capabilities

Capability	Availability
Grounding
Tuning
System instruction
JSON support
Provisioned Throughput	See Supported models.

Specifications

Specification	Value
Max input tokens	8,192
Max output tokens	2,048

Gemini language support

All the Gemini models can understand and respond in the following languages:

Arabic (ar), Bengali (bn), Bulgarian (bg), Chinese simplified and traditional (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (iw), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swahili (sw), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Gemini 1.5 Pro and Gemini 1.5 Flash models can understand and respond in the following additional languages:

Afrikaans (af), Amharic (am), Assamese (as), Azerbaijani (az), Belarusian (be), Bosnian (bs), Catalan (ca), Cebuano (ceb), Corsican (co), Welsh (cy), Dhivehi (dv), Esperanto (eo), Basque (eu), Persian (fa), Filipino (Tagalog) (fil), Frisian (fy), Irish (ga), Scots Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hawaiian (haw), Hmong (hmn), Haitian Creole (ht), Armenian (hy), Igbo (ig), Icelandic (is), Javanese (jv), Georgian (ka), Kazakh (kk), Khmer (km), Kannada (kn), Krio (kri), Kurdish (ku), Kyrgyz (ky), Latin (la), Luxembourgish (lb), Lao (lo), Malagasy (mg), Maori (mi), Macedonian (mk), Malayalam (ml), Mongolian (mn), Meiteilon (Manipuri) (mni-Mtei), Marathi (mr), Malay (ms), Maltese (mt), Myanmar (Burmese) (my), Nepali (ne), Nyanja (Chichewa) (ny), Odia (Oriya) (or), Punjabi (pa), Pashto (ps), Sindhi (sd), Sinhala (Sinhalese) (si), Samoan (sm), Shona (sn), Somali (so), Albanian (sq), Sesotho (st), Sundanese (su), Tamil (ta), Telugu (te), Tajik (tg), Uyghur (ug), Urdu (ur), Uzbek (uz), Xhosa (xh), Yiddish (yi), Yoruba (yo), Zulu (zu)

Gemma models

The following table summarizes Gemma models.

Model	Inputs	Outputs	Use case	Try the model
Gemma Model details	Text	Text	A small-sized, lightweight open text model supporting text generation, summarization, and extraction. Deployable in environments with limited resources.	Try Gemma
CodeGemma Model details	Text, Code, PDF	Text	A collection of lightweight open code models built on top of Gemma. Best for code generation and completion.	Try CodeGemma
PaliGemma Model details	Text, Images	Text	A lightweight vision-language model (VLM). Best for image captioning tasks and visual question and answering tasks.	Try PaliGemma

Gemma language support

Gemma supports only the English language.

Embeddings models

The following table summarizes the models available in the Embeddings API.

Model name	Description	Specifications	Try the model
Embeddings for text (`textembedding-gecko@001, textembedding-gecko@002, textembedding-gecko@003, text-embedding-004`) Model details	Returns embeddings for English text inputs. Supports supervised tuning of Embeddings for text models, English only.	Max token input: 3,072 (`textembedding-gecko@001`). Others: 2,048. Embedding dimensions: `text-embedding-004`: <=768. Others: 768.	Try Embeddings for text
Embeddings for text multilingual (`textembedding-gecko-multilingual@001`, `text-multilingual-embedding-002`) Model details	Returns embeddings for text inputs of over 100 languages Supports supervised tuning of the `text-multilingual-embedding-002` model. Supports 100 languages.	Max token input: 2,048. Embedding dimensions: `text-multilingual-embedding-002`: <=768. Others: 768.	Try Embeddings for text multilingual
Embeddings for multimodal `(multimodalembedding)` Model details	Returns embedding for text, image, and video inputs, to compare content across different models. Converts text, image, and video into the same vector space. Video only supports 1408 dimensions. English only	Max token input: 32. Max image size: 20 MB. Max video length: Two minutes. Embedding dimensions: 128, 256, 512, or 1408 for text+image input, 1408 for video input.	Try Embeddings for multimodal

Embeddings language support

Text multilingual embedding models support the following languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Imagen model

The following table summarizes the models available in the Imagen API:

Model	Inputs	Outputs	Use case	Try the model
Imagen 3 (`imagen-3.0-generate-001`, `imagen-3.0-fast-generate-001`) Imagen 2 (`imagegeneration@006`, `imagegeneration@005`) Imagen (`imagegeneration@002`) Model details	Text	Images	This model supports image generation and editing to create high quality images in seconds. This includes image generation using zero-shot learning.	Try Imagen for image generation
Imagen 3 (Editing and customization) (`imagen-3.0-capability-001`) Imagen 2 (Editing) (`imagegeneration@006`) Imagen (Editing) `imagegeneration@002`) Model details	Text and images	Images	This model supports image editing and customized (few-shot) image generation to create high quality images in seconds. The editing feature supports inpainting (object removal or insertion), outpainting, and product image editing. Customization supports few-shot learning, letting you provide reference images to guide generation of output images. This model supports the following types of customization: subject (product, person, and animal companion), style, controlled customization (scribble or canny edge), and instruct customization (style transfer).	Try Imagen for editing and customization

Imagen 3 language support

Imagen 3 supports the following languages:
English, Chinese, Hindi, Japanese, Korean, Portuguese, and Spanish.

Code completion model

The following table summarizes the models available in the Codey APIs:

Model	Inputs	Outputs	Use case	Try the model
Codey for Code Completion (`code-gecko`) Model details	Code in supported languages	Code in supported languages	A model fine-tuned to suggest code completion based on the context in code that's written.	Try Codey for Code Completion

Code completion model language support

The Code completion model supports the English language.

MedLM models

The following table summarizes the models available in the MedLM API:

Model name	Description	Specifications	Try the model
MedLM-medium (`medlm-medium`) Model details	A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents. Provides better throughput and includes more recent data than the `medlm-large` model.	Max tokens (input + output): 32,768. Max output tokens: 8,192.	Try MedLM-medium
MedLM-large (`medlm-large`) Model details	A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. This model helps healthcare practitioners with medical question and answer tasks, and summarization tasks for healthcare and medical documents.	Max input tokens: 8,192. Max output tokens: 1,024.	Try MedLM-large

MedLM Provisioned Throughput support

MedLM-medium and MedLM-large support Provisioned Throughput. See Supported models.

MedLM language support

The MedLM model supports the English language.

Locations

For a list of locations where these models are available, see Generative AI on Vertex AI locations.

Model versions

To learn about model versions, see Model versions.

Explore all models in Model Garden

Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.

Go to Model Garden

To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.

What's next

Try a quickstart tutorial using Vertex AI Studio or the Vertex AI API.
Learn how to test text prompts.
Learn how to test chat prompts.
Explore pretrained models in Model Garden.
Learn how to tune a foundation model.
Learn about responsible AI best practices and Vertex AI's safety filters.
Learn how to control access to specific models in Model Garden by using a Model Garden organization policy.