Vertex AI release notes

This page documents production updates to Vertex AI. Check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.

January 31, 2025

You can now monitor usage, throughput, and latency and troubleshoot 429 errors on Vertex AI foundation models, like Google Gemini and Anthropic Claude, by using a predefined dashboard. After querying a model from the Vertex AI Model Garden, you can find the name of the model you queried in the Vertex AI Dashboard page under the "Model observability" heading.

To customize the dashboard and explore relevant metrics in Cloud Monitoring, click Show All Metrics. For information about using dashboards in Cloud Monitoring, see View and customize Google Cloud dashboards.

January 30, 2025

Mistral Large (24.07) and Codestral (24.05) that are offered as a Model as a Service (MaaS) models in Model Garden are deprecated. For details, see Generative AI on Vertex AI deprecations.

January 29, 2025

New Imagen 3 image generation model available to users

A newer improved Imagen 3 image generation model is now available to all users:

imagen-3.0-generate-002

This image generation model supports the following additional features:

Prompt enhancement - The LLM-based prompt rewriter tool adds additional details and descriptive language to the prompt you provide, generally resulting in higher quality generated images. This feature is configurable and is enabled by default.

For more information, see Imagen on Vertex AI model versions and lifecycle and Generate images using text prompts.

January 22, 2025

LangChain on Vertex AI

Billing for LangChain on Vertex AI will start on March 4, 2025.

The pricing structure is based on vCPU hours and GiB hours used. This means that you will be charged for both the compute (vCPU) and memory resources consumed by your LangChain on Vertex AI workloads.

You can review the pricing details in the table below.

Product	SKU ID	Price
ReasoningEngine vCPU	8A55-0B95-B7DC	$0.0994/vCPU-Hr
ReasoningEngine Memory	0B45-6103-6EC1	$0.0105/GiB-Hr

January 21, 2025

Anthropic's Claude 3 Sonnet that is offered as a Model as a Service (MaaS) model in Model Garden is deprecated. For details, see Generative AI on Vertex AI deprecations.

January 17, 2025

Agent evaluation using the Gen AI evaluation service is available in Preview.

December 20, 2024

RAG Engine is generally available (GA).

The supported models include the following:

Google Gemini
Google embedding and OSS E5 embedding models
Model Garden self-deployed OSS LLMs
Model as a service (MaaS) Llama models

The supported features include the following:

Data connectors: Google Cloud Storage, Google Drive, Slack, Jira, and SharePoint
Document types: Google Workspace documents, HTML, JSON, Markdown, PDF, and text files
Transformations: fixed-size chunking and chunk overlap
Vector databases: Vertex AI Vector Search and Pinecone

December 18, 2024

Hex-LLM: High-Efficiency Large Language Model Serving is available in General Availability (GA).

This launch adds support for the following models:

Llama 3.1
Llama 3.2
Phi-3
Qwen2 and Qwen2.5

Additional supported features:

Multi-host serving.
Disaggregated serving (experimental).
Prefix caching.
AWQ quantization.

December 17, 2024

You can copy tuned Gemini 1.5 Pro 002 and Gemini 1.5 Flash 002 adapter models across projects. For details, see Copy a model in Vertex AI Model Registry.

December 11, 2024

The Gemini 2.0 Flash (gemini-2.0-flash-exp) model is Generally available for grounded answer generation with RAG. This model is tuned to address context-based question and answering tasks. For more information, see Ground responses for Gemini models.

December 10, 2024

Imagen 3 image generation models Generally Available to all users

Imagen 3 image generation models are now available to all users without requiring prior approval. These include the following image generation models:

imagen-3.0-generate-001
imagen-3.0-fast-generate-001 (low latency model)

Prior image generation models (imagegeneration@006, imagegeneration@005, imagegeneration@002) still require approval to use.

For more information, see Imagen on Vertex AI model versions and lifecycle and Generate images using text prompts.

Imagen 3 Customization model Generally Available to approved users

Imagen 3 Customization model is now available to approved users. This includes the following model:

imagen-3.0-capability

Imagen 3 Customization lets you guide image generation by providing reference images (few-shot learning). Imagen 3 Customization lets you customize generated images for the following feature categories:

Subject Customization (product, person, and animal companion)
Style Customization
Controlled Customization (canny edge and scribble)
Instruct Customization (Style transfer)

Imagen 3 editing model Generally Available to approved users

The Imagen 3 Editing model is now available to approved users. This includes the following model:

imagen-3.0-capability

This model offers the following additional features:

Inpainting - Add or remove content from a masked area of an image
Outpainting - Expand a masked area of an image
Product image editing - Identify and maintain a primary product while changing the background or product position

For more information, see Model versions.

December 06, 2024

A vulnerability was discovered in the Vertex AI API serving Gemini multimodal requests, allowing bypass of VPC Service Controls. For details, see the Security bulletins page.

November 21, 2024

Mistral Large (24.11) is Generally Available on Vertex AI as a managed model. To learn more, view the Mistral Large (24.11) model card in Model Garden.

The Gen AI evaluation service can now help you evaluate your translation models using MetricX, COMET, and BLEU metrics. To learn more about evaluating your translation models, see Evaluate translation models.

November 08, 2024

Batch predictions for Llama models on Vertex AI (MaaS) is available in Preview.

Batch prediction support for Gemini

Batch prediction is available for Gemini in General Availability (GA). Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, see Get batch predictions for Gemini.

November 05, 2024

We are extending the availability of Gemini 1.0 Pro 001 and Gemini 1.0 Pro Vision 001 from February 15, 2025 to April 9, 2025. For details, see the Deprecations.

November 04, 2024

The translation LLM now supports Polish, Turkish, Indonesian, Dutch, Vietnamese, Thai and Czech. For the full list of supported languages, see the Translate text page.

The Anthropic Claude Haiku 3.5 is Generally Available on Vertex AI. To learn more, view the Claude Haiku 3.5 model card in Model Garden.

October 28, 2024

You can now fine-tune the following models from the Cloud console:

The Whisper large v3 and Whisper large v3 turbo models have been added to Model Garden.

Updated the fine-tuning notebooks for Gemma 2, Llama 3.1, Mistral, and Mixtral with the following enhancements:

The notebooks use an updated high-performance container for single host multi-GPU LoRA fine-tuning.
- Better throughput and GPU utilization with well-tested max-sequence-lengths.
- Support for input token masking.
- No out of memory (OOM) error during fine-tuning.
Added a custom dataset example that uses a template and format validation.
Support for a default accelerator pool with quota checks.
Improved documentation.

October 22, 2024

The Anthropic Claude Sonnet 3.5 v2 is Generally Available. To learn more, view the Claude Sonnet 3.5 v2 model card in Model Garden.

October 18, 2024

The Llama 3.1 405B model that is managed on Vertex AI is now Generally Available.

October 09, 2024

The Vertex AI Gemini API SDK supports tokenization capabilities for local token counting and computation. This is a streamlined way to compute tokens locally, ensuring compatibility across different Gemini models and their tokenizers. Supported models include gemini-1.5-flash and gemini-1.5-pro . To learn more, see Count tokens.

October 04, 2024

The AI assistant in Vertex AI Studio can help you refine and generate prompts. This feature is in Preview. To learn more, see Use AI-powered prompt writing tools.

Prompt Guard and Flux were added to Model Garden.

You can deploy Hugging Face models on Google Cloud that have text embedding inference enabled or pytorch inference enabled. For more information, see the Hugging Face model deployment in the console.

Added multiple deployment settings (with A100-80G and H100) and sample requests for some popular models, including Llama 3.1, Gemma 2, and Mixtral.

Added dynamic LoRA serving for Llama 3.1 and Stable Diffusion XL.

October 01, 2024

Grounding: Dynamic retrieval for grounded results (GA)

Dynamic retrieval lets you choose when to turn off grounding with Google Search. This is useful when a prompt doesn't require an answer grounded in Google Search, and the supported models can provide an answer based on their knowledge without grounding. Dynamic retrieval helps you manage latency, quality, and cost more effectively.

This feature is Generally Available. For more information, see Dynamic retrieval.

September 30, 2024

Prompt templates let you to test how different prompt formats perform with different sets of prompt data. This feature is in Preview. To learn more, see Use prompt templates.

September 25, 2024

The Llama 3.2 90B model is available in Preview on Vertex AI. Llama 3.2 90B enables developers to build and deploy the latest generative AI models and applications that use Llama's capabilities, such as image reasoning. Llama 3.2 is also designed to be more accessible for on-device applications. For more information, see Llama models.

September 24, 2024

New stable versions of Gemini 1.5 Pro (gemini-1.5-pro-002) and Gemini 1.5 Flash (gemini-1.5-flash-002) are Generally Available. These models introduce broad quality improvements over the previous 001 versions, with significant gains in the following categories:

Factuality and reduce model hallucinations
Openbook Q&A for RAG use cases
Instruction following
Multilingual understanding in 102 languages, especially in Korean, French, German, Spanish, Japanese, Russian, and Chinese.
SQL generation
Audio understanding
Document understanding
Long context
Math and reasoning

For more information about differences with the previous model versions, see Model versions and lifecycle.

The 2M context window with Gemini 1.5 Pro is now in Generally Available, which opens up long-form multimodal use cases that only Gemini can support.

Use Gemini to directly analyze YouTube videos and publicly available media (such as images, audio, and video) by using a link. This feature is in Public Preview.

The new API parameters audioTimestamp, responseLogprob, and logprobs are in Public Preview. For more information, see API reference.

Gemini 1.5 Pro and Gemini 1.5 Flash now support multimodal input with function calling. This feature is in Preview.

The Vertex AI prompt optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model. This feature is available in Preview. To learn more, see Optimize prompts.

Gemini 1.5 Pro and Gemini 1.5 Flash Tuning is now available in GA. Tune Gemini with text, image, audio, and document data types using the latest models:

gemini-1.5-pro-002
gemini-1.5-flash-002

Gemini 1.0 tuning remains in preview.

For more information on tuning Gemini, see Tune Gemini models by using supervised fine-tuning.

The latest versions of Gemini 1.5 Flash (gemini-1.5-flash-002) and Gemini 1.5 Pro (gemini-1.5-pro-002) use dynamic shared quota, which distributes on-demand capacity among all queries being processed. Dynamic shared quota is Generally Available.

Controlled generation is now Generally Available.

September 20, 2024

Add label metadata to generateContent and streamGenerateContent API calls. For details, see Add labels to API calls.

September 18, 2024

Model Garden supports an organization policy so that administrators can limit access to certain models and capabilities. For more information, see Control access to Model Garden models

September 03, 2024

Gemini 1.5 Flash (gemini-1.5-flash) supports controlled generation.

August 30, 2024

Gen AI Evaluation Service is Generally Available. To learn more, see the Gen AI Evaluation Service overview.

August 26, 2024

For controlled generation, you can have the model respond with an enum value in plain text, as defined in your response schema. Set the responseMimeType to text/x.enum. For more information, see Control generated output.

August 22, 2024

AI21 Labs

Managed models from AI21 Labs are available on Vertex AI. To use a AI21 Labs model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see AI21 models.

August 09, 2024

Gemini on Vertex AI supports multiple response candidates. For details, see Generate content with the Gemini API.

August 05, 2024

The translation LLM now supports Arabic, Hindi, and Russian. For the full list of supported languages, see the Translate text page.

August 02, 2024

Vertex AI SDK for Python supports token listing and counting for prompts without the need to make API calls. This feature is available in (Preview). For details, see List and count tokens.

July 31, 2024

New Imagen on Vertex AI image generation model and features

The Imagen 3 image generation models (imagen-3.0-generate-001 and the low-latency version imagen-3.0-fast-generate-001) are Generally Available to approved users. These models offer the following additional features:

Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
Digital watermark (SynthID) enabled by default
Watermark verification
User-configurable safety features (safety setting, person/face setting)

For more information, see Model versions and Generate images using text prompts.

Gemma 2 2B is available in Model Garden. For details, see Use Gemma open models.

The following models have been added to Model Garden:

Gemma 2 2B: A foundation LLM by Google Deepmind.
Qwen2: An LLM series by Alibaba Cloud.
Phi-3: An LLM series by Microsoft.

Resource and deployment settings were made to the following models:

Added GPU inferences for gemma2-27b and gemma2-27b-it with verified performances.
Added verified deployment settings for Mistral AI models that are deployed from Huggingface, including mistralai/mistral-nemo-instruct-2407, mistralai/mistral-nemo-base-2407, mistralai/mistral-large-instruct-2407, and mistralai/codestral-22b-v0.1.
Added multiple deployment settings with A100 (40G), A100 (80G) and H100 (80G) for select models, such as llama3.1, llama3, gemma2, gemma, and mistral-7b.

July 30, 2024

See the Gemini Online Inference on Vertex AI Service Level Agreement (SLA).

July 24, 2024

Mistral AI

Managed models from Mistral AI are available on Vertex AI. To use a Mistral AI model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see Mistral AI models.

July 23, 2024

Llama 3.1

The Llama 3.1 405B model is available in Preview on Vertex AI. Llama 3.1 405B provides capabilities from synthetic data generation to model distillation, steerability, math, tool use, multilingual translation, and more. For more information, see Llama models.

July 02, 2024

Google's open weight Gemma 2 model is available in Model Garden. For details, see Use Gemma open models.

MaMMUT is now available in Model Garden. MaMMUT is a vision-encoder and text-decoder model for multimodal tasks such as visual question answering, image-text retrieval, text-image retrieval, and generation of multimodal embeddings.

June 28, 2024

The following models have been added to Model Garden:

36 Hugging Face embedding models with verified deployment settings such as BAAI/bge-m3 and intfloat/multilingual-e5-large-instruct.
35 Hugging Face PyTorch models with verified deployment settings such as stabilityai/stable-diffusion-2-1.

For more information, see the Hugging Face model deployment in the console.

Launched Hex-LLM for high-efficiency large language model serving. This performant TPU serving solution is based on XLA and optimized kernels to achieve high throughput and low latency.

Hex-LLM uses several parallelism strategies for multiple TPU chips, quantizations, dynamic LoRA, and more. Hex-LLM supports the following dense and sparse LLMs:

Gemma 2B and 7B
Gemma 2 9B and 27B
Llama 2 7B, 13B and 70B
Llama 3 8B and 70B
Mistral 7B and Mixtral 8x7B

Updated Docker images in Llama 3 notebooks that are more efficient at tuning.
A notebook-based interactive workshop UI was added in Model Garden for image generative models such as stable-diffusion-xl-base, image inpainting, controlnet. You can find these models from the Open Notebook list.
Colab Notebooks for frequently used models in Model Garden have been revised with no-code or low-code implementations to improve accessibility and user experience.

June 27, 2024

Context caching is available for Gemini 1.5 Pro. Use context caching to reduce the cost of requests that contain repeat content with high input token counts. For more information, see Context caching overview.

June 25, 2024

Controlled generation is available on Gemini 1.5 Pro and supports the JSON schema. For more information, see Control generated output.

June 20, 2024

The Anthropic Claude Sonnet 3.5 is Generally Available. To learn more, view the Claude Sonnet 3.5 model card in Model Garden.

June 17, 2024

Increased the input token limit for Gemini 1.5 Pro from 1M to 2M. For more information, see Google models.

June 11, 2024

Upload media from Google Drive

You can upload media, such as PDF, MP4, WAV, and JPG files from Google Drive, when you send image, video, audio, and document prompt requests.

June 10, 2024

Experiment in the Vertex AI Studio login-free

The Vertex AI Studio multi-model prompt designer can be accessed login-free. With this feature, prospective customers can use the Vertex AI Studio to test queries before deciding to sign up and create an account. To learn more about this experience, see Vertex AI Studio console experiences or to access the console directly go to Vertex AI Studio.

May 31, 2024

Anthropic Claude 3.0 Opus model

The Anthropic Claude 3.0 Opus model is Generally Available. To learn more, see its model card in Model Garden.

Generative AI on Vertex AI Regional APIs

Generative AI on Vertex AI regional APIs are available in the following three regions:

us-east5
me-central1
me-central2

May 28, 2024

Gemini models support the frequencyPenalty and presencePenalty parameters. Use frequencyPenalty to control the probability of repeated text in a response. Use presencePenalty to control the probability of generating more diverse content. For more information, see Gemini model parameters.

May 24, 2024

The Gemini 1.5 Pro (gemini-1.5-pro-001) and Gemini 1.5 Flash (gemini-1.5-flash-001) models are Generally Available. For more information, see Google models, Overview of the Gemini API, and Send multimodal prompt requests.

May 20, 2024

The following models have been added to Model Garden:

E5: A text embedding model series that can be served with a GPU or CPU.
Instant ID: An identity preserving text-to-image generation model.
Stable Diffusion XL lightning: A text-to-image generation model that is based on SDXL but requires fewer inference iterations.

To see a list of all available models, see Explore models in Model Garden.

May 14, 2024

Gemini 1.5 Flash (Preview)

Gemini 1.5 Flash (gemini-1.5-flash-preview-0514) is available in Preview. Gemini 1.5 Flash is a multimodal model designed for fast, high volume, cost-effective text generation and chat applications. It can analyze text, code, audio, PDF, video, and video with audio.

Grounding Gemini with Google Search is GA

The Gemini API Grounding with Google Search feature is available in GA. This is available for Gemini 1.0 Pro models. To learn more about model grounding, see Grounding with Google Search.

Batch prediction support for Gemini

Batch prediction is available for Gemini in preview. Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, see Get batch predictions for Gemini.

PaliGemma model

The PaliGemma model is available. PaliGemma is a lightweight open model that's part of the Google Gemma model family. It's the Gemma model family's best model option for image captioning tasks and visual question and answering tasks. Gemma models are based on Gemini models and intended to be extended by customers.

New stable text embedding models

The following text embedding models are available GA:

text-embedding-004
text-multilingual-embedding-002

For details on how to use these models, see Get text embeddings.

April 18, 2024

Meta's open weight Llama 3 model is available in the Vertex AI Model Garden.

April 11, 2024

Anthropic Claude 3.0 Opus model

The Anthropic Claude 3.0 Opus model is available in Preview. The Claude 3.0 Opus model is an Anthropic partner model that you can use with Vertex AI. It's the most capable of the Anthropic models at performing complex tasks quickly. To learn more, see its model card in Model Garden.

April 09, 2024

New Imagen on Vertex AI image generation model and features

The 006 version of the Imagen 2 image generation model (imagegeneration@006) is now available. This model offers the following additional features:

Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
Digital watermark (SynthID) enabled by default
Watermark verification*
New user-configurable safety features (safety setting, person/face setting)

For more information, see Model versions and Generate images using text prompts.

* The seed field can't be used while digital watermark is enabled.

New Imagen on Vertex AI image editing model and features

The 006 version of the Imagen 2 image editing model (imagegeneration@006) is now available. This model offers the following additional features:

Inpainting - Add or remove content from a masked area of an image
Outpainting - Expand a masked area of an image
Product image editing - Identify and maintain a primary product while changing the background or product position

For more information, see Model versions.

Change in Imagen image generation version 006 (imagegeneration@006) seed field behavior

For the new Imagen image generation model version 006 (imagegeneration@006) the seed field behavior has changed. For the v.006 model a digital watermark is enabled by default for image generation. To be able to use a seed value to get deterministic output you must disable digital watermark generation by setting the following parameter: "addWatermark": false.

For more information, see the Imagen for image generation and editing API reference.

CodeGemma model

The CodeGemma model is available. CodeGemma is a lightweight open model that's part of the Google Gemma model family. CodeGemma is the Gemma model family's code generation and code completion offering. Gemma models are based on Gemini models and intended to be extended by customers.

Grounding Gemini and Grounding with Google Search

The Gemini API now supports Grounding with Google Search in Preview. Currently available for Gemini 1.0 Pro models.

Regional APIs

Regional APIs are available in 11 new countries for Gemini, Imagen, and embeddings.
US and EU have machine-learning processing boundaries for the gemini-1.0-pro-001, gemini-1.0-pro-002, gemini-1.0-pro-vision-001, and imagegeneration@005 models.

Generative AI on Vertex AI security control update

Security controls are available for the online prediction feature for Gemini 1.0 Pro and Gemini 1.0 Pro Vision.

Gemini 1.5 Pro (Preview)

Gemini 1.5 Pro (gemini-1.5-pro-preview-0409) is available in Preview. Gemini 1.5 Pro is a multimodal model that analyzes text, code, audio, PDF, video, and video with audio.

New text embedding models

The following text embedding models are now in Preview.

text-embedding-preview-0409
text-multilingual-embedding-preview-0409

When evaluated using the MTEB benchmarks, these models produce better embeddings compared to previous versions. The new models also offer dynamic embedding sizes, which you can use to output smaller embedding dimensions, with minor performance loss, to save on computing and storage costs.

For details on how to use these models, refer to the public documentation and try out our Colab.

System instructions

System instructions are supported in Preview by the Gemini 1.0 Pro (stable version gemini-1.0-pro-002 only) and Gemini 1.5 Pro (Preview) multimodal models. Use system instructions to guide model behavior based on your specific needs and use cases. For more information, see System instructions examples.

Supervised Tuning for Gemini

Supervised tuning is available for the gemini-1.0-pro-002 model.

Online Evaluation Service

Generative AI evaluation supports online evaluation in addition to pipeline evaluation. The list of supported evaluation metrics has also expanded. See API reference and SDK reference.

Generative AI Knowledge Base

The Jump Start Solution: Generative AI Knowledge Base demonstrates how to build a simple chatbot with business- and domain-specific knowledge.

Text translation

Translate text in Vertex AI Studio is available in Preview.

Gemini 1.0 Pro stable version 002

The 002 version of the Gemini 1.0 Pro multimodal model (gemini-1.0-pro-002) is available. For more information about stable versions of Gemini models, see Gemini model versions and lifecycle.

Vertex AI Studio features and updates

The Vertex AI Studio supports side-by-side comparison to allow users to compare up to 3 prompts in a side-by-side view.
The Vertex AI Studio supports rapid evaluation in console and the ability to upload a ground truth response (or a model response to try to emulate).

To learn more, see Try your prompts in Vertex AI Studio

April 02, 2024

Model Garden supports all Text Generation Inference supported models in HuggingFace:

Verified deployment settings for about 400 Hugging Face text generation models (including google/gemma-7b-it, meta-llama/Llama-2-7b-chat-hf, and mistralai/Mistral-7B-v0.1).
Other Hugging Face text generation models have unverified deployment settings that are auto generated.

March 29, 2024

The MedLM-large model infrastructure has been upgraded to improve latency and stability. Responses from the model might be slightly different.