Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your AI-based applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available in the various APIs and gives you guidance on which models to choose by use case.
To learn more about all AI models and APIs on Vertex AI, see Explore AI models and APIs.
Foundation model APIs
Vertex AI has the following foundation model APIs:
- Gemini API (Multimodal text, image, audio, video, PDF, code, and chat)
- PaLM API (Text, chat, and embeddings)
- Codey APIs (Code generation, code chat, and code completion)
- Imagen API (Image generation, image editing, image captioning, visual question answering, and multimodal embedding)
Gemini API models
The following table summarizes the models available in the Gemini API:
Model name | Description | Model properties | Tuning support |
---|---|---|---|
Gemini 1.5 Pro (Preview) ( gemini-1.5-pro ) |
Multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. Gemini 1.5 Pro supports long-context understanding with up to 1 million tokens. | Max total tokens (input and output): 1M Max output tokens: 8,192 Max raw image size: 20 MB Max base64 encoded image size: 7 MB Max images per prompt: 3,000 Max video length: 1 hour Max videos per prompt: 10 Max audio length: approximately 8.4 hours Max audio per prompt: 1 Max PDF size: 50 MB Training data: Up to April 2024 |
Supervised: No RLHF: No Distillation: No |
Gemini 1.0 Pro ( gemini-1.0-pro ) |
Designed to handle natural language tasks, multiturn text and code chat, and code generation. Use Gemini 1.0 Pro for prompts that only contain text. | Max total tokens (input and output): 32,760 Max output tokens: 8,192 Training data: Up to Feb 2023 |
Supervised: Yes RLHF: No Distillation: No |
Gemini 1.0 Pro Vision ( gemini-1.0-pro-vision ) |
Multimodal model that supports adding image, PDF, and video in text or chat prompts for a text or code response. Use Gemini 1.0 Pro Vision multimodal prompts. | Max total tokens (input and output): 16,384 Max output tokens: 2,048 Max image size: No limit Max images per prompt: 16 Max video length: 2 minutes Max videos per prompt: 1 Training data: Up to Feb 2023 |
Supervised: No RLHF: No Distillation: No |
Gemini 1.0 Ultra (GA with allow list) | Google's most capable multimodal model, optimized for complex tasks including instruction, code, and reasoning, with support for multiple languages. Gemini 1.0 Ultra is generally available (GA) to a select set of customers. | Max tokens input: 8,192 Max tokens output: 2,048 |
Supervised: No RLHF: No Distillation: No |
Gemini 1.0 Ultra Vision (GA with allow list) | Google's most capable multimodal vision model, optimized to support text, images, videos, and multi-turn chat. Gemini 1.0 Ultra Vision is generally available (GA) to a select set of customers. | Max tokens input: 8,192 Max tokens output: 2,048 |
Supervised: No RLHF: No Distillation: No |
PaLM API models
The following table summarizes the models available in the PaLM API:
Model name | Description | Model properties | Tuning support |
---|---|---|---|
PaLM 2 for Text ( text-bison ) |
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as classification, summarization, and extraction. | Maximum input tokens: 8192 Maximum output tokens: 1024 Training data: Up to Feb 2023 |
Supervised: Yes RLHF: Yes (Preview) Distillation: No |
PaLM 2 for Text (text-unicorn ) |
The most advanced text model in the PaLM family of models for use with complex natural language tasks. | Maximum input tokens: 8192 Maximum output tokens: 1024 Training data: Up to Feb 2023 |
Supervised: No RLHF: No Distillation: Yes (Preview) |
PaLM 2 for Text 32k ( text-bison-32k ) |
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks. | Max tokens (input + output): 32,768 Max output tokens: 8,192 Training data: Up to Aug 2023 |
Supervised: Yes RLHF: No Distillation: No |
PaLM 2 for Chat ( chat-bison ) |
Fine-tuned for multi-turn conversation use cases. | Maximum input tokens: 8192 Maximum output tokens: 2048 Training data: Up to Feb 2023 Maximum turns : 2500 |
Supervised: Yes RLHF: No Distillation: No |
PaLM 2 for Chat 32k ( chat-bison-32k ) |
Fine-tuned for multi-turn conversation use cases. | Max tokens (input + output): 32,768 Max output tokens: 8,192 Training data: Up to Aug 2023 Max turns : 2500 |
Supervised: Yes RLHF: No Distillation: No |
Embeddings for Text ( textembedding-gecko ) |
Returns model embeddings for text inputs. | 3072 input tokens and outputs 768-dimensional vector embeddings. |
Supervised: Yes RLHF: No Distillation: No |
Embeddings for Text multilingual ( textembedding-gecko-multilingual ) |
Returns model embeddings for text inputs which support over 100 languages | 3072 input tokens and outputs 768-dimensional vector embeddings. |
Supervised: Yes
(Preview) RLHF: No Distillation: No |
Codey APIs models
The following table summarizes the models available in the Codey APIs:
Model name | Description | Model properties | Tuning support |
---|---|---|---|
Codey for Code Generation ( code-bison ) |
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. | Maximum input tokens: 6144 Maximum output tokens: 1024 |
Supervised: Yes RLHF: No Distillation: No |
Codey for Code Generation 32k ( code-bison-32k ) |
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. | Max tokens (input + output): 32,768 Max output tokens: 8,192 |
Supervised: Yes RLHF: No Distillation: No |
Codey for Code Chat ( codechat-bison ) |
A model fine-tuned for chatbot conversations that help with code-related questions. | Maximum input tokens: 6144 Maximum output tokens: 1024 |
Supervised: Yes RLHF: No Distillation: No |
Codey for Code Chat 32k ( codechat-bison-32k ) |
A model fine-tuned for chatbot conversations that help with code-related questions. | Max tokens (input + output): 32,768 Max output tokens: 8,192 |
Supervised: Yes RLHF: No Distillation: No |
Codey for Code Completion ( code-gecko ) |
A model fine-tuned to suggest code completion based on the context in code that's written. | Maximum input tokens: 2048 Maximum output tokens: 64 |
Supervised: No RLHF: No Distillation: No |
Imagen API models
The following table summarizes the models available in the Imagen API:
Model name | Description | Model properties | Tuning support |
---|---|---|---|
Imagen for Image Generation ( imagegeneration ) | This model supports image generation and can create high quality visual assets in seconds. | Maximum requests per minute per project: 100 Maximum images generated: 8 Maximum base image (editing/upscaling): 10 MB Generated image resolution: 1024x1024 pixels |
Supervised: No RLHF: No |
Embeddings for Multimodal ( multimodalembedding ) | This model generates vectors based on the input you provide, which can include a combination of image and text. | Maximum requests per minute per project: 120 Maximum text length: 32 tokens Language: English Maximum image size: 20 MB |
Supervised: No RLHF: No |
Image captioning ( imagetext ) | The model that supports image captioning. This model generates a caption from an image you provide based on the language that you specify. | Maximum requests per minute per project: 500 Languages: English, French, German, Italian, Spanish Maximum image size: 10 MB Maximum number of captions: 3 |
Supervised: No RLHF: No |
Visual Question Answering - VQA ( imagetext ) | A model which supports image question and answering. | Maximum requests per minute per project: 500 Languages: English Maximum image size: 10 MB Maximum number of answers: 3 |
Supervised: No RLHF: No |
MedLM API models
The following table summarizes the models available in the MedLM API:
Model name | Description | Model properties | Tuning support |
---|---|---|---|
MedLM-medium (medlm-medium ) | A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. These models help healthcare practitioners with medical question and answering (Q&A) and summarizing healthcare and medical documents. | Max tokens (input + output): 32,768 Max output tokens: 8,192 Languages: English |
Supervised: No RLHF: No |
MedLM-large (medlm-large ) | A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. These models help healthcare practitioners with medical question and answering (Q&A) and summarizing healthcare and medical documents. | Max input tokens: 8,192 Max output tokens: 1,024 Languages: English |
Supervised: No RLHF: No |
Language support
Vertex AI PaLM API and Vertex AI Gemini API are Generally Available (GA) for the following languages:
- Arabic (
ar
) - Bengali (
bn
) - Bulgarian (
bg
) - Chinese simplified and traditional (
zh
) - Croatian (
hr
) - Czech (
cs
) - Danish (
da
) - Dutch (
nl
) - English (
en
) - Estonian (
et
) - Finnish (
fi
) - French (
fr
) - German (
de
) - Greek (
el
) - Hebrew (
iw
) - Hindi (
hi
) - Hungarian (
hu
) - Indonesian (
id
) - Italian (
it
) - Japanese (
ja
) - Korean (
ko
) - Latvian (
lv
) - Lithuanian (
lt
) - Norwegian (
no
) - Polish (
pl
) - Portuguese (
pt
) - Romanian (
ro
) - Russian (
ru
) - Serbian (
sr
) - Slovak (
sk
) - Slovenian (
sl
) - Spanish (
es
) - Swahili (
sw
) - Swedish (
sv
) - Thai (
th
) - Turkish (
tr
) - Ukrainian (
uk
) - Vietnamese (
vi
)
For access to other languages, contact your Google Cloud representative.
Explore all models in Model Garden
Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.
To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.
What's next
- Try a quickstart tutorial using Vertex AI Studio or the Vertex AI API.
- Learn how to test text prompts.
- Learn how to test chat prompts.
- Explore pretrained models in Model Garden.
- Learn how to tune a foundation model.
- Learn about responsible AI best practices and Vertex AI's safety filters.