Vertex AI release notes

This page documents production updates to Vertex AI. Check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.

October 18, 2024

The Llama 3.1 405B model that is managed on Vertex AI is now Generally Available.

October 09, 2024

The Vertex AI Gemini API SDK supports tokenization capabilities for local token counting and computation. This is a streamlined way to compute tokens locally, ensuring compatibility across different Gemini models and their tokenizers. Supported models include gemini-1.5-flash and gemini-1.5-pro . To learn more, see Count tokens.

October 04, 2024

The AI assistant in Vertex AI Studio can help you refine and generate prompts. This feature is in Preview. To learn more, see Use AI-powered prompt writing tools.

Prompt Guard and Flux were added to Model Garden.

You can deploy Hugging Face models on Google Cloud that have text embedding inference enabled or pytorch inference enabled. For more information, see the Hugging Face model deployment in the console.

Added multiple deployment settings (with A100-80G and H100) and sample requests for some popular models, including Llama 3.1, Gemma 2, and Mixtral.

Added dynamic LoRA serving for Llama 3.1 and Stable Diffusion XL.

October 01, 2024

Grounding: Dynamic retrieval for grounded results (GA)

Dynamic retrieval lets you choose when to turn off grounding with Google Search. This is useful when a prompt doesn't require an answer grounded in Google Search, and the supported models can provide an answer based on their knowledge without grounding. Dynamic retrieval helps you manage latency, quality, and cost more effectively.

This feature is Generally Available. For more information, see Dynamic retrieval.

September 30, 2024

Prompt templates let you to test how different prompt formats perform with different sets of prompt data. This feature is in Preview. To learn more, see Use prompt templates.

September 25, 2024

The Llama 3.2 90B model is available in Preview on Vertex AI. Llama 3.2 90B enables developers to build and deploy the latest generative AI models and applications that use Llama's capabilities, such as image reasoning. Llama 3.2 is also designed to be more accessible for on-device applications. For more information, see Llama models.

September 24, 2024

New stable versions of Gemini 1.5 Pro (gemini-1.5-pro-002) and Gemini 1.5 Flash (gemini-1.5-flash-002) are Generally Available. These models introduce broad quality improvements over the previous 001 versions, with significant gains in the following categories:

  • Factuality and reduce model hallucinations
  • Openbook Q&A for RAG use cases
  • Instruction following
  • Multilingual understanding in 102 languages, especially in Korean, French, German, Spanish, Japanese, Russian, and Chinese.
  • SQL generation
  • Audio understanding
  • Document understanding
  • Long context
  • Math and reasoning

For more information about differences with the previous model versions, see Model versions and lifecycle.

The 2M context window with Gemini 1.5 Pro is now in Generally Available, which opens up long-form multimodal use cases that only Gemini can support.

Use Gemini to directly analyze YouTube videos and publicly available media (such as images, audio, and video) by using a link. This feature is in Public Preview.

The new API parameters audioTimestamp, responseLogprob, and logprobs are in Public Preview. For more information, see API reference.

Gemini 1.5 Pro and Gemini 1.5 Flash now support multimodal input with function calling. This feature is in Preview.

The Vertex AI prompt optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model. This feature is available in Preview. To learn more, see Optimize prompts.

Gemini 1.5 Pro and Gemini 1.5 Flash Tuning is now available in GA. Tune Gemini with text, image, audio, and document data types using the latest models:

  • gemini-1.5-pro-002
  • gemini-1.5-flash-002

Gemini 1.0 tuning remains in preview.

For more information on tuning Gemini, see Tune Gemini models by using supervised fine-tuning.

The latest versions of Gemini 1.5 Flash (gemini-1.5-flash-002) and Gemini 1.5 Pro (gemini-1.5-pro-002) use dynamic shared quota, which distributes on-demand capacity among all queries being processed. Dynamic shared quota is Generally Available.

September 20, 2024

Add label metadata to generateContent and streamGenerateContent API calls. For details, see Add labels to API calls.

September 18, 2024

Model Garden supports an organization policy so that administrators can limit access to certain models and capabilities. For more information, see Control access to Model Garden models

September 03, 2024

Gemini 1.5 Flash (gemini-1.5-flash) supports controlled generation.

August 30, 2024

Gen AI Evaluation Service is Generally Available. To learn more, see the Gen AI Evaluation Service overview.

August 26, 2024

For controlled generation, you can have the model respond with an enum value in plain text, as defined in your response schema. Set the responseMimeType to text/x.enum. For more information, see Control generated output.

August 22, 2024

AI21 Labs

Managed models from AI21 Labs are available on Vertex AI. To use a AI21 Labs model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see AI21 models.

August 09, 2024

Gemini on Vertex AI supports multiple response candidates. For details, see Generate content with the Gemini API.

August 05, 2024

The translation LLM now supports Arabic, Hindi, and Russian. For the full list of supported languages, see the Translate text page.

August 02, 2024

Vertex AI SDK for Python supports token listing and counting for prompts without the need to make API calls. This feature is available in (Preview). For details, see List and count tokens.

July 31, 2024

New Imagen on Vertex AI image generation model and features

The Imagen 3 image generation models (imagen-3.0-generate-001 and the low-latency version imagen-3.0-fast-generate-001) are Generally Available to approved users. These models offer the following additional features:

  • Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
  • Digital watermark (SynthID) enabled by default
  • Watermark verification
  • User-configurable safety features (safety setting, person/face setting)

For more information, see Model versions and Generate images using text prompts.

Gemma 2 2B is available in Model Garden. For details, see Use Gemma open models.

The following models have been added to Model Garden:

  • Gemma 2 2B: A foundation LLM by Google Deepmind.
  • Qwen2: An LLM series by Alibaba Cloud.
  • Phi-3: An LLM series by Microsoft.

Resource and deployment settings were made to the following models:

July 30, 2024

July 24, 2024

Mistral AI

Managed models from Mistral AI are available on Vertex AI. To use a Mistral AI model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see Mistral AI models.

July 23, 2024

Llama 3.1

The Llama 3.1 405B model is available in Preview on Vertex AI. Llama 3.1 405B provides capabilities from synthetic data generation to model distillation, steerability, math, tool use, multilingual translation, and more. For more information, see Llama models.

July 02, 2024

Google's open weight Gemma 2 model is available in Model Garden. For details, see Use Gemma open models.

MaMMUT is now available in Model Garden. MaMMUT is a vision-encoder and text-decoder model for multimodal tasks such as visual question answering, image-text retrieval, text-image retrieval, and generation of multimodal embeddings.

June 28, 2024

The following models have been added to Model Garden:

For more information, see the Hugging Face model deployment in the console.

Launched Hex-LLM for high-efficiency large language model serving. This performant TPU serving solution is based on XLA and optimized kernels to achieve high throughput and low latency.

Hex-LLM uses several parallelism strategies for multiple TPU chips, quantizations, dynamic LoRA, and more. Hex-LLM supports the following dense and sparse LLMs:

  • Gemma 2B and 7B
  • Gemma 2 9B and 27B
  • Llama 2 7B, 13B and 70B
  • Llama 3 8B and 70B
  • Mistral 7B and Mixtral 8x7B
  • Updated Docker images in Llama 3 notebooks that are more efficient at tuning.
  • A notebook-based interactive workshop UI was added in Model Garden for image generative models such as stable-diffusion-xl-base, image inpainting, controlnet. You can find these models from the Open Notebook list.
  • Colab Notebooks for frequently used models in Model Garden have been revised with no-code or low-code implementations to improve accessibility and user experience.

June 27, 2024

Context caching is available for Gemini 1.5 Pro. Use context caching to reduce the cost of requests that contain repeat content with high input token counts. For more information, see Context caching overview.

June 25, 2024

Controlled generation is available on Gemini 1.5 Pro and supports the JSON schema. For more information, see Control generated output.

June 20, 2024

The Anthropic Claude Sonnet 3.5 is Generally Available. To learn more, view the Claude Sonnet 3.5 model card in Model Garden.

June 17, 2024

Increased the input token limit for Gemini 1.5 Pro from 1M to 2M. For more information, see Google models.

June 11, 2024

Upload media from Google Drive

You can upload media, such as PDF, MP4, WAV, and JPG files from Google Drive, when you send image, video, audio, and document prompt requests.

June 10, 2024

Experiment in the Vertex AI Studio login-free

The Vertex AI Studio multi-model prompt designer can be accessed login-free. With this feature, prospective customers can use the Vertex AI Studio to test queries before deciding to sign up and create an account. To learn more about this experience, see Vertex AI Studio console experiences or to access the console directly go to Vertex AI Studio.

May 31, 2024

Anthropic Claude 3.0 Opus model

The Anthropic Claude 3.0 Opus model is Generally Available. To learn more, see its model card in Model Garden.

Generative AI on Vertex AI Regional APIs

Generative AI on Vertex AI regional APIs are available in the following three regions:

  • us-east5
  • me-central1
  • me-central2

May 28, 2024

Gemini models support the frequencyPenalty and presencePenalty parameters. Use frequencyPenalty to control the probability of repeated text in a response. Use presencePenalty to control the probability of generating more diverse content. For more information, see Gemini model parameters.

May 24, 2024

The Gemini 1.5 Pro (gemini-1.5-pro-001) and Gemini 1.5 Flash (gemini-1.5-flash-001) models are Generally Available. For more information, see Google models, Overview of the Gemini API, and Send multimodal prompt requests.

May 20, 2024

The following models have been added to Model Garden:

  • E5: A text embedding model series that can be served with a GPU or CPU.
  • Instant ID: An identity preserving text-to-image generation model.
  • Stable Diffusion XL lightning: A text-to-image generation model that is based on SDXL but requires fewer inference iterations.

To see a list of all available models, see Explore models in Model Garden.

May 14, 2024

Gemini 1.5 Flash (Preview)

Gemini 1.5 Flash (gemini-1.5-flash-preview-0514) is available in Preview. Gemini 1.5 Flash is a multimodal model designed for fast, high volume, cost-effective text generation and chat applications. It can analyze text, code, audio, PDF, video, and video with audio.

Grounding Gemini with Google Search is GA

The Gemini API Grounding with Google Search feature is available in GA. This is available for Gemini 1.0 Pro models. To learn more about model grounding, see Grounding with Google Search.

Batch prediction support for Gemini

Batch prediction is available for Gemini in preview. Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, see Get batch predictions for Gemini.

PaliGemma model

The PaliGemma model is available. PaliGemma is a lightweight open model that's part of the Google Gemma model family. It's the Gemma model family's best model option for image captioning tasks and visual question and answering tasks. Gemma models are based on Gemini models and intended to be extended by customers.

New stable text embedding models

The following text embedding models are available GA:

  • text-embedding-004
  • text-multilingual-embedding-002

For details on how to use these models, see Get text embeddings.

April 18, 2024

Meta's open weight Llama 3 model is available in the Vertex AI Model Garden.

April 11, 2024

Anthropic Claude 3.0 Opus model

The Anthropic Claude 3.0 Opus model is available in Preview. The Claude 3.0 Opus model is an Anthropic partner model that you can use with Vertex AI. It's the most capable of the Anthropic models at performing complex tasks quickly. To learn more, see its model card in Model Garden.

April 09, 2024

New Imagen on Vertex AI image generation model and features

The 006 version of the Imagen 2 image generation model (imagegeneration@006) is now available. This model offers the following additional features:

  • Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
  • Digital watermark (SynthID) enabled by default
  • Watermark verification*
  • New user-configurable safety features (safety setting, person/face setting)

For more information, see Model versions and Generate images using text prompts.

* The seed field can't be used while digital watermark is enabled.

New Imagen on Vertex AI image editing model and features

The 006 version of the Imagen 2 image editing model (imagegeneration@006) is now available. This model offers the following additional features:

  • Inpainting - Add or remove content from a masked area of an image
  • Outpainting - Expand a masked area of an image
  • Product image editing - Identify and maintain a primary product while changing the background or product position

For more information, see Model versions.

Change in Imagen image generation version 006 (imagegeneration@006) seed field behavior

For the new Imagen image generation model version 006 (imagegeneration@006) the seed field behavior has changed. For the v.006 model a digital watermark is enabled by default for image generation. To be able to use a seed value to get deterministic output you must disable digital watermark generation by setting the following parameter: "addWatermark": false.

For more information, see the Imagen for image generation and editing API reference.

CodeGemma model

The CodeGemma model is available. CodeGemma is a lightweight open model that's part of the Google Gemma model family. CodeGemma is the Gemma model family's code generation and code completion offering. Gemma models are based on Gemini models and intended to be extended by customers.

Grounding Gemini and Grounding with Google Search

The Gemini API now supports Grounding with Google Search in Preview. Currently available for Gemini 1.0 Pro models.

Regional APIs

  • Regional APIs are available in 11 new countries for Gemini, Imagen, and embeddings.
  • US and EU have machine-learning processing boundaries for the gemini-1.0-pro-001, gemini-1.0-pro-002, gemini-1.0-pro-vision-001, and imagegeneration@005 models.

Generative AI on Vertex AI security control update

Security controls are available for the online prediction feature for Gemini 1.0 Pro and Gemini 1.0 Pro Vision.

Gemini 1.5 Pro (Preview)

Gemini 1.5 Pro (gemini-1.5-pro-preview-0409) is available in Preview. Gemini 1.5 Pro is a multimodal model that analyzes text, code, audio, PDF, video, and video with audio.

New text embedding models

The following text embedding models are now in Preview.

  • text-embedding-preview-0409
  • text-multilingual-embedding-preview-0409

When evaluated using the MTEB benchmarks, these models produce better embeddings compared to previous versions. The new models also offer dynamic embedding sizes, which you can use to output smaller embedding dimensions, with minor performance loss, to save on computing and storage costs.

For details on how to use these models, refer to the public documentation and try out our Colab.

System instructions

System instructions are supported in Preview by the Gemini 1.0 Pro (stable version gemini-1.0-pro-002 only) and Gemini 1.5 Pro (Preview) multimodal models. Use system instructions to guide model behavior based on your specific needs and use cases. For more information, see System instructions examples.

Supervised Tuning for Gemini

Supervised tuning is available for the gemini-1.0-pro-002 model.

Online Evaluation Service

Generative AI evaluation supports online evaluation in addition to pipeline evaluation. The list of supported evaluation metrics has also expanded. See API reference and SDK reference.

Generative AI Knowledge Base

The Jump Start Solution: Generative AI Knowledge Base demonstrates how to build a simple chatbot with business- and domain-specific knowledge.

Text translation

Translate text in Vertex AI Studio is available in Preview.

Gemini 1.0 Pro stable version 002

The 002 version of the Gemini 1.0 Pro multimodal model (gemini-1.0-pro-002) is available. For more information about stable versions of Gemini models, see Gemini model versions and lifecycle.

Vertex AI Studio features and updates

  • The Vertex AI Studio supports side-by-side comparison to allow users to compare up to 3 prompts in a side-by-side view.
  • The Vertex AI Studio supports rapid evaluation in console and the ability to upload a ground truth response (or a model response to try to emulate).

To learn more, see Try your prompts in Vertex AI Studio

April 02, 2024

Model Garden supports all Text Generation Inference supported models in HuggingFace:

March 29, 2024

The MedLM-large model infrastructure has been upgraded to improve latency and stability. Responses from the model might be slightly different.