Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your applications. Foundation models are fine-tuned for specific use cases and offered at different price points. This page summarizes the models that are available and gives you guidance on which models to use.
To learn more about all AI models and APIs on Vertex AI, see Explore AI models and APIs.
Model naming scheme
Foundation model names have two components: use case and model size. The naming
convention is in the format <use case>-<model size>
. For example, text-bison
represents the Bison text model.
The model sizes are:
- Unicorn: The largest model in PaLM family. Unicorn models excel at complex tasks, such as coding and chain-of-thought (CoT), due to the extensive knowledge embedded into the model and its reasoning capabilities.
- Bison: The best value PaLM model that handles a wide range of language tasks, such as classification, summarization. It is optimized for accuracy and latency at a reasonable cost. The text, chat, code, and codechat interfaces simplifies deployment and integration into your application.
- Gecko: Smallest and lowest cost model for simple tasks.
You can use the stable or the latest version of a model. For more information, see Model versions and lifecycle.
Foundation models
The following table gives you an overview of the foundation models that are available in Vertex AI, including whether the model can be tuned using supervised tuning or reinforcement learning from human feedback (RLHF) tuning.
Model name | Description | Model properties | Model tuning supported |
---|---|---|---|
PaLM 2 for Text (text-bison ) |
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as classification, summarization, and extraction. | Maximum input tokens: 8192 Maximum output tokens: 1024 Training data: Up to Feb 2023 |
|
PaLM 2 for Text (text-unicorn ) |
The most advanced text model in the PaLM family of models for use with complex natural language tasks. | Maximum input tokens: 8192 Maximum output tokens: 1024 Training data: Up to Feb 2023 |
|
Embeddings for Text (textembedding-gecko ) |
Returns model embeddings for text inputs. | 3072 input tokens and outputs 768-dimensional vector embeddings. | |
Embeddings for Text multilingual (textembedding-gecko-multilingual ) |
Returns model embeddings for text inputs which support over 100 languages | 3072 input tokens and outputs 768-dimensional vector embeddings. | |
PaLM 2 for Chat (chat-bison ) |
Fine-tuned for multi-turn conversation use cases. | Maximum input tokens: 8192 Maximum output tokens: 1024 Training data: Up to Feb 2023 Maximum turns : 2500 |
|
Codey for Code Generation (code-bison ) |
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. | Maximum input tokens: 6144 Maximum output tokens: 1024 |
|
Codey for Code Chat (codechat-bison ) |
A model fine-tuned for chatbot conversations that help with code-related questions. | Maximum input tokens: 6144 Maximum output tokens: 1024 |
|
Codey for Code Completion (code-gecko )Model tuning not supported | A model fine-tuned to suggest code completion based on the context in code that's written. | Maximum input tokens: 2048 Maximum output tokens: 64 |
|
Imagen for Image Generation (imagegeneration ) | This model supports image generation and can create high quality visual assets in seconds. | Maximum requests per minute per project: 100 Maximum images generated: 8 Maximum base image (editing/upscaling): 10 MB Generated image resolution: 1024x1024 pixels |
|
Embeddings for Multimodal (multimodalembedding ) | This model generates vectors based on the input you provide, which can include a combination of image and text. | Maximum requests per minute per project: 120 Maximum text length: 32 tokens Language: English Maximum image size: 20 MB |
|
Image captioning (imagetext ) | The model that supports image captioning. This model generates a caption from an image you provide based on the language that you specify. | Maximum requests per minute per project: 500 Languages: English, French, German, Italian, Spanish Maximum image size: 10 MB Maximum number of captions: 3 |
|
Visual Question Answering - VQA (imagetext ) | A model which supports image question and answering. | Maximum requests per minute per project: 500 Languages: English Maximum image size: 10 MB Maximum number of answers: 3 |
|
32k models
You can access models that support up to 32k tokens per request. The total amount of input and output tokens adds up to 32k and the maximum output tokens is 8,192. For example, if the input is 28k tokens, the output can be 4k tokens.
To use a 32k model, specify the fully-qualified model name. For example,
text-bison-32k
.
The following table lists the available 32k models:
Model name | Description | Model properties | Model tuning supported |
---|---|---|---|
text-bison-32k (Model tuning not supported) |
Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks. | Maximum input and output tokens combined: 32k Training data: Up to Aug 2023 |
|
chat-bison-32k (Model tuning not supported) |
Fine-tuned for multi-turn conversation use cases. | Maximum input and output tokens combined: 32k Training data: Up to Aug 2023 Maximum turns : 2500 |
|
code-bison-32k (Model tuning not supported) |
A model fine-tuned to generate code based on a natural language description of the desired code. For example, it can generate a unit test for a function. | Maximum input and output tokens combined: 32k |
|
codechat-bison-32k (Model tuning not supported) |
A model fine-tuned for chatbot conversations that help with code-related questions. | Maximum input and output tokens combined: 32k |
|
Language support
Vertex AI PaLM API is Generally Available (GA) for the following languages:
- Arabic (
ar
) - Bengali (
bn
) - Bulgarian (
bg
) - Chinese simplified and traditional (
zh
) - Croatian (
hr
) - Czech (
cs
) - Danish (
da
) - Dutch (
nl
) - English (
en
) - Estonian (
et
) - Finnish (
fi
) - French (
fr
) - German (
de
) - Greek (
el
) - Hebrew (
iw
) - Hindi (
hi
) - Hungarian (
hu
) - Indonesian (
id
) - Italian (
it
) - Japanese (
ja
) - Korean (
ko
) - Latvian (
lv
) - Lithuanian (
lt
) - Norwegian (
no
) - Polish (
pl
) - Portuguese (
pt
) - Romanian (
ro
) - Russian (
ru
) - Serbian (
sr
) - Slovak (
sk
) - Slovenian (
sl
) - Spanish (
es
) - Swahili (
sw
) - Swedish (
sv
) - Thai (
th
) - Turkish (
tr
) - Ukrainian (
uk
) - Vietnamese (
vi
)
For access to other languages, contact your Google Cloud representative.
Explore all pretrained models in Model Garden
Model Garden is a platform that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. To explore the generative AI models and APIs that are available on Vertex AI, go to Model Garden in the Google Cloud console.
To learn more about Model Garden, including available models and capabilities, see Explore AI models in Model Garden.
What's next
- Try a quickstart tutorial using Generative AI Studio or the Vertex AI API.
- Learn how to test text prompts.
- Learn how to test chat prompts.
- Explore pretrained models in Model Garden.
- Learn how to tune a foundation model.
- Learn about responsible AI best practices and Vertex AI's safety filters.