Model information

Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your AI-based applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available in the various APIs and gives you guidance on which models to choose by use case.

To learn more about all AI models and APIs on Vertex AI, see Explore AI models and APIs.

Foundation model APIs

Vertex AI has the following foundation model APIs:

  • Gemini API (Multimodal text, image, audio, video, PDF, code, and chat)
  • PaLM API (Text, chat, and embeddings)
  • Codey APIs (Code generation, code chat, and code completion)
  • Imagen API (Image generation, image editing, image captioning, visual question answering, and multimodal embedding)

Gemini API models

The following table summarizes the models available in the Gemini API:

Model name Description Model properties Tuning support
Gemini 1.5 Pro (Preview)
(gemini-1.5-pro)
Multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. Gemini 1.5 Pro supports long-context understanding with up to 1 million tokens. Max total tokens (input and output): 1M
Max output tokens: 8,192
Max raw image size: 20 MB
Max base64 encoded image size: 7 MB
Max images per prompt: 3,000
Max video length: 1 hour
Max videos per prompt: 10
Max audio length: approximately 8.4 hours
Max audio per prompt: 1
Max PDF size: 50 MB
Training data: Up to April 2024
Supervised: No
RLHF: No
Distillation: No
Gemini 1.0 Pro
(gemini-1.0-pro)
Designed to handle natural language tasks, multiturn text and code chat, and code generation. Use Gemini 1.0 Pro for prompts that only contain text. Max total tokens (input and output): 32,760
Max output tokens: 8,192
Training data: Up to Feb 2023
Supervised: Yes
RLHF: No
Distillation: No
Gemini 1.0 Pro Vision
(gemini-1.0-pro-vision)
Multimodal model that supports adding image, PDF, and video in text or chat prompts for a text or code response. Use Gemini 1.0 Pro Vision multimodal prompts. Max total tokens (input and output): 16,384
Max output tokens: 2,048
Max image size: No limit
Max images per prompt: 16
Max video length: 2 minutes
Max videos per prompt: 1
Training data: Up to Feb 2023
Supervised: No
RLHF: No
Distillation: No
Gemini 1.0 Ultra (GA with allow list) Google's most capable multimodal model, optimized for complex tasks including instruction, code, and reasoning, with support for multiple languages. Gemini 1.0 Ultra is generally available (GA) to a select set of customers. Max tokens input: 8,192
Max tokens output: 2,048
Supervised: No
RLHF: No
Distillation: No
Gemini 1.0 Ultra Vision (GA with allow list) Google's most capable multimodal vision model, optimized to support text, images, videos, and multi-turn chat. Gemini 1.0 Ultra Vision is generally available (GA) to a select set of customers. Max tokens input: 8,192
Max tokens output: 2,048
Supervised: No
RLHF: No
Distillation: No

PaLM API models

The following table summarizes the models available in the PaLM API:

Model name Description Model properties Tuning support
PaLM 2 for Text
(text-bison)
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as classification, summarization, and extraction. Maximum input tokens: 8192
Maximum output tokens: 1024
Training data: Up to Feb 2023
Supervised: Yes
RLHF: Yes (Preview)
Distillation: No
PaLM 2 for Text (text-unicorn) The most advanced text model in the PaLM family of models for use with complex natural language tasks. Maximum input tokens: 8192
Maximum output tokens: 1024
Training data: Up to Feb 2023
Supervised: No
RLHF: No
Distillation: Yes (Preview)
PaLM 2 for Text 32k
(text-bison-32k)
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks. Max tokens (input + output): 32,768
Max output tokens: 8,192
Training data: Up to Aug 2023
Supervised: Yes
RLHF: No
Distillation: No
PaLM 2 for Chat
(chat-bison)
Fine-tuned for multi-turn conversation use cases. Maximum input tokens: 8192
Maximum output tokens: 2048
Training data: Up to Feb 2023
Maximum turns : 2500
Supervised: Yes
RLHF: No
Distillation: No
PaLM 2 for Chat 32k
(chat-bison-32k)
Fine-tuned for multi-turn conversation use cases. Max tokens (input + output): 32,768
Max output tokens: 8,192
Training data: Up to Aug 2023
Max turns : 2500
Supervised: Yes
RLHF: No
Distillation: No
Embeddings for Text
(textembedding-gecko)
Returns model embeddings for text inputs. 3072 input tokens and outputs 768-dimensional vector embeddings. Supervised: Yes
RLHF: No
Distillation: No
Embeddings for Text multilingual
(textembedding-gecko-multilingual)
Returns model embeddings for text inputs which support over 100 languages 3072 input tokens and outputs 768-dimensional vector embeddings. Supervised: Yes (Preview)
RLHF: No
Distillation: No

Codey APIs models

The following table summarizes the models available in the Codey APIs:

Model name Description Model properties Tuning support
Codey for Code Generation
(code-bison)
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. Maximum input tokens: 6144
Maximum output tokens: 1024
Supervised: Yes
RLHF: No
Distillation: No
Codey for Code Generation 32k
(code-bison-32k)
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. Max tokens (input + output): 32,768
Max output tokens: 8,192
Supervised: Yes
RLHF: No
Distillation: No
Codey for Code Chat
(codechat-bison)
A model fine-tuned for chatbot conversations that help with code-related questions. Maximum input tokens: 6144
Maximum output tokens: 1024
Supervised: Yes
RLHF: No
Distillation: No
Codey for Code Chat 32k
(codechat-bison-32k)
A model fine-tuned for chatbot conversations that help with code-related questions. Max tokens (input + output): 32,768
Max output tokens: 8,192
Supervised: Yes
RLHF: No
Distillation: No
Codey for Code Completion
(code-gecko)
A model fine-tuned to suggest code completion based on the context in code that's written. Maximum input tokens: 2048
Maximum output tokens: 64
Supervised: No
RLHF: No
Distillation: No

Imagen API models

The following table summarizes the models available in the Imagen API:

Model name Description Model properties Tuning support
Imagen for Image Generation
(imagegeneration)
This model supports image generation and can create high quality visual assets in seconds. Maximum requests per minute per project: 100
Maximum images generated: 8
Maximum base image (editing/upscaling): 10 MB
Generated image resolution: 1024x1024 pixels
Supervised: No
RLHF: No
Embeddings for Multimodal
(multimodalembedding)
This model generates vectors based on the input you provide, which can include a combination of image and text. Maximum requests per minute per project: 120
Maximum text length: 32 tokens
Language: English
Maximum image size: 20 MB
Supervised: No
RLHF: No
Image captioning
(imagetext)
The model that supports image captioning. This model generates a caption from an image you provide based on the language that you specify. Maximum requests per minute per project: 500
Languages: English, French, German, Italian, Spanish
Maximum image size: 10 MB
Maximum number of captions: 3
Supervised: No
RLHF: No
Visual Question Answering - VQA
(imagetext)
A model which supports image question and answering. Maximum requests per minute per project: 500
Languages: English
Maximum image size: 10 MB
Maximum number of answers: 3
Supervised: No
RLHF: No

MedLM API models

The following table summarizes the models available in the MedLM API:

Model name Description Model properties Tuning support
MedLM-medium (medlm-medium) A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. These models help healthcare practitioners with medical question and answering (Q&A) and summarizing healthcare and medical documents. Max tokens (input + output): 32,768
Max output tokens: 8,192
Languages: English
Supervised: No
RLHF: No
MedLM-large (medlm-large) A HIPAA-compliant suite of medically tuned models and APIs powered by Google Research. These models help healthcare practitioners with medical question and answering (Q&A) and summarizing healthcare and medical documents. Max input tokens: 8,192
Max output tokens: 1,024
Languages: English
Supervised: No
RLHF: No

Language support

Vertex AI PaLM API and Vertex AI Gemini API are Generally Available (GA) for the following languages:

  • Arabic (ar)
  • Bengali (bn)
  • Bulgarian (bg)
  • Chinese simplified and traditional (zh)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • Estonian (et)
  • Finnish (fi)
  • French (fr)
  • German (de)
  • Greek (el)
  • Hebrew (iw)
  • Hindi (hi)
  • Hungarian (hu)
  • Indonesian (id)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Latvian (lv)
  • Lithuanian (lt)
  • Norwegian (no)
  • Polish (pl)
  • Portuguese (pt)
  • Romanian (ro)
  • Russian (ru)
  • Serbian (sr)
  • Slovak (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swahili (sw)
  • Swedish (sv)
  • Thai (th)
  • Turkish (tr)
  • Ukrainian (uk)
  • Vietnamese (vi)

For access to other languages, contact your Google Cloud representative.

Explore all models in Model Garden

Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.

Go to Model Garden

To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.

What's next