You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.
To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.
December 18, 2024
Hex-LLM: High-Efficiency Large Language Model Serving is available in General Availability (GA).
This launch adds support for the following models:
- Llama 3.1
- Llama 3.2
- Phi-3
- Qwen2 and Qwen2.5
Additional supported features:
- Multi-host serving.
- Disaggregated serving (experimental).
- Prefix caching.
- AWQ quantization.
December 17, 2024
You can copy tuned Gemini 1.5 Pro 002 and Gemini 1.5 Flash 002 adapter models across projects. For details, see Copy a model in Vertex AI Model Registry.
December 11, 2024
The Gemini 2.0 Flash (gemini-2.0-flash-exp
) model is Generally available for grounded answer generation with RAG. This model is tuned to address context-based question and answering tasks. For more information, see Ground responses for Gemini models.
December 10, 2024
Imagen 3 image generation models Generally Available to all users
Imagen 3 image generation models are now available to all users without requiring prior approval. These include the following image generation models:
imagen-3.0-generate-001
imagen-3.0-fast-generate-001
(low latency model)
Prior image generation models (imagegeneration@006
, imagegeneration@005
, imagegeneration@002
) still require approval to use.
For more information, see Imagen on Vertex AI model versions and lifecycle and Generate images using text prompts.
Imagen 3 Customization model Generally Available to approved users
Imagen 3 Customization model is now available to approved users. This includes the following model:
imagen-3.0-capability
Imagen 3 Customization lets you guide image generation by providing reference images (few-shot learning). Imagen 3 Customization lets you customize generated images for the following feature categories:
- Subject Customization (product, person, and animal companion)
- Style Customization
- Controlled Customization (canny edge and scribble)
- Instruct Customization (Style transfer)
Imagen 3 editing model Generally Available to approved users
The Imagen 3 Editing model is now available to approved users. This includes the following model:
imagen-3.0-capability
This model offers the following additional features:
- Inpainting - Add or remove content from a masked area of an image
- Outpainting - Expand a masked area of an image
- Product image editing - Identify and maintain a primary product while changing the background or product position
For more information, see Model versions.
December 06, 2024
A vulnerability was discovered in the Vertex AI API serving Gemini multimodal requests, allowing bypass of VPC Service Controls. For details, see the Security bulletins page.
November 21, 2024
Mistral Large (24.11) is Generally Available on Vertex AI as a managed model. To learn more, view the Mistral Large (24.11) model card in Model Garden.
The Gen AI evaluation service can now help you evaluate your translation models using MetricX, COMET, and BLEU metrics. To learn more about evaluating your translation models, see Evaluate translation models.
November 08, 2024
Batch predictions for Llama models on Vertex AI (MaaS) is available in Preview.
Batch prediction support for Gemini
Batch prediction is available for Gemini in General Availability (GA). Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, see Get batch predictions for Gemini.
November 05, 2024
We are extending the availability of Gemini 1.0 Pro 001 and Gemini 1.0 Pro Vision 001 from February 15, 2025 to April 9, 2025. For details, see the Deprecations.
November 04, 2024
The translation LLM now supports Polish, Turkish, Indonesian, Dutch, Vietnamese, Thai and Czech. For the full list of supported languages, see the Translate text page.
The Anthropic Claude Haiku 3.5 is Generally Available on Vertex AI. To learn more, view the Claude Haiku 3.5 model card in Model Garden.
October 28, 2024
You can now fine-tune the following models from the Cloud console:
The Whisper large v3 and Whisper large v3 turbo models have been added to Model Garden.
Updated the fine-tuning notebooks for Gemma 2, Llama 3.1, Mistral, and Mixtral with the following enhancements:
- The notebooks use an updated high-performance container for single host multi-GPU LoRA fine-tuning.
- Better throughput and GPU utilization with well-tested max-sequence-lengths.
- Support for input token masking.
- No out of memory (OOM) error during fine-tuning.
- Added a custom dataset example that uses a template and format validation.
- Support for a default accelerator pool with quota checks.
- Improved documentation.
October 22, 2024
The Anthropic Claude Sonnet 3.5 v2 is Generally Available. To learn more, view the Claude Sonnet 3.5 v2 model card in Model Garden.
October 18, 2024
The Llama 3.1 405B model that is managed on Vertex AI is now Generally Available.
October 09, 2024
The Vertex AI Gemini API SDK supports tokenization capabilities for local token counting and computation. This is a streamlined way to compute tokens locally, ensuring compatibility across different Gemini models and their tokenizers. Supported models include gemini-1.5-flash
and gemini-1.5-pro
. To learn more, see Count tokens.
October 04, 2024
The AI assistant in Vertex AI Studio can help you refine and generate prompts. This feature is in Preview. To learn more, see Use AI-powered prompt writing tools.
Prompt Guard and Flux were added to Model Garden.
You can deploy Hugging Face models on Google Cloud that have text embedding inference enabled or pytorch inference enabled. For more information, see the Hugging Face model deployment in the console.
Added multiple deployment settings (with A100-80G and H100) and sample requests for some popular models, including Llama 3.1, Gemma 2, and Mixtral.
Added dynamic LoRA serving for Llama 3.1 and Stable Diffusion XL.
October 01, 2024
Grounding: Dynamic retrieval for grounded results (GA)
Dynamic retrieval lets you choose when to turn off grounding with Google Search. This is useful when a prompt doesn't require an answer grounded in Google Search, and the supported models can provide an answer based on their knowledge without grounding. Dynamic retrieval helps you manage latency, quality, and cost more effectively.
This feature is Generally Available. For more information, see Dynamic retrieval.
September 30, 2024
Prompt templates let you to test how different prompt formats perform with different sets of prompt data. This feature is in Preview. To learn more, see Use prompt templates.
September 25, 2024
The Llama 3.2 90B model is available in Preview on Vertex AI. Llama 3.2 90B enables developers to build and deploy the latest generative AI models and applications that use Llama's capabilities, such as image reasoning. Llama 3.2 is also designed to be more accessible for on-device applications. For more information, see Llama models.
September 24, 2024
New stable versions of Gemini 1.5 Pro (gemini-1.5-pro-002
) and Gemini 1.5 Flash (gemini-1.5-flash-002
)
are Generally Available. These models introduce broad quality improvements over the previous 001
versions, with significant gains in the following categories:
- Factuality and reduce model hallucinations
- Openbook Q&A for RAG use cases
- Instruction following
- Multilingual understanding in 102 languages, especially in Korean, French, German, Spanish, Japanese, Russian, and Chinese.
- SQL generation
- Audio understanding
- Document understanding
- Long context
- Math and reasoning
For more information about differences with the previous model versions, see Model versions and lifecycle.
The 2M context window with Gemini 1.5 Pro is now in Generally Available, which opens up long-form multimodal use cases that only Gemini can support.
Use Gemini to directly analyze YouTube videos and publicly available media (such as images, audio, and video) by using a link. This feature is in Public Preview.
The new API parameters audioTimestamp
, responseLogprob
, and logprobs
are in Public Preview. For more information, see API reference.
Gemini 1.5 Pro and Gemini 1.5 Flash now support multimodal input with function calling. This feature is in Preview.
The Vertex AI prompt optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model. This feature is available in Preview. To learn more, see Optimize prompts.
Gemini 1.5 Pro and Gemini 1.5 Flash Tuning is now available in GA. Tune Gemini with text, image, audio, and document data types using the latest models:
gemini-1.5-pro-002
gemini-1.5-flash-002
Gemini 1.0 tuning remains in preview.
For more information on tuning Gemini, see Tune Gemini models by using supervised fine-tuning.
The latest versions of Gemini 1.5 Flash (gemini-1.5-flash-002
) and Gemini 1.5 Pro (gemini-1.5-pro-002
) use dynamic shared quota, which distributes on-demand capacity among all queries being processed. Dynamic shared quota is Generally Available.
September 20, 2024
Add label metadata to generateContent
and streamGenerateContent
API calls. For details, see Add labels to API calls.
September 18, 2024
Model Garden supports an organization policy so that administrators can limit access to certain models and capabilities. For more information, see Control access to Model Garden models
September 03, 2024
Gemini 1.5 Flash (gemini-1.5-flash
) supports controlled generation.
August 30, 2024
Gen AI Evaluation Service is Generally Available. To learn more, see the Gen AI Evaluation Service overview.
August 26, 2024
For controlled generation, you can have the model respond with an enum value in plain text, as defined in your response schema. Set the responseMimeType
to text/x.enum
. For more information, see Control generated output.
August 22, 2024
AI21 Labs
Managed models from AI21 Labs are available on Vertex AI. To use a AI21 Labs model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see AI21 models.
August 09, 2024
Gemini on Vertex AI supports multiple response candidates. For details, see Generate content with the Gemini API.
August 05, 2024
The translation LLM now supports Arabic, Hindi, and Russian. For the full list of supported languages, see the Translate text page.
August 02, 2024
Vertex AI SDK for Python supports token listing and counting for prompts without the need to make API calls. This feature is available in (Preview). For details, see List and count tokens.
July 31, 2024
New Imagen on Vertex AI image generation model and features
The Imagen 3 image generation models (imagen-3.0-generate-001
and the low-latency version imagen-3.0-fast-generate-001
) are Generally Available to approved users. These models offer the following additional features:
- Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
- Digital watermark (SynthID) enabled by default
- Watermark verification
- User-configurable safety features (safety setting, person/face setting)
For more information, see Model versions and Generate images using text prompts.
Gemma 2 2B is available in Model Garden. For details, see Use Gemma open models.
The following models have been added to Model Garden:
- Gemma 2 2B: A foundation LLM by Google Deepmind.
- Qwen2: An LLM series by Alibaba Cloud.
- Phi-3: An LLM series by Microsoft.
Resource and deployment settings were made to the following models:
- Added GPU inferences for gemma2-27b and gemma2-27b-it with verified performances.
- Added verified deployment settings for Mistral AI models that are deployed from Huggingface, including mistralai/mistral-nemo-instruct-2407, mistralai/mistral-nemo-base-2407, mistralai/mistral-large-instruct-2407, and mistralai/codestral-22b-v0.1.
- Added multiple deployment settings with A100 (40G), A100 (80G) and H100 (80G) for select models, such as llama3.1, llama3, gemma2, gemma, and mistral-7b.
July 30, 2024
July 24, 2024
Mistral AI
Managed models from Mistral AI are available on Vertex AI. To use a Mistral AI model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, see Mistral AI models.
July 23, 2024
Llama 3.1
The Llama 3.1 405B model is available in Preview on Vertex AI. Llama 3.1 405B provides capabilities from synthetic data generation to model distillation, steerability, math, tool use, multilingual translation, and more. For more information, see Llama models.
July 02, 2024
Google's open weight Gemma 2 model is available in Model Garden. For details, see Use Gemma open models.
MaMMUT is now available in Model Garden. MaMMUT is a vision-encoder and text-decoder model for multimodal tasks such as visual question answering, image-text retrieval, text-image retrieval, and generation of multimodal embeddings.
June 28, 2024
The following models have been added to Model Garden:
- 36 Hugging Face embedding models with verified deployment settings such as BAAI/bge-m3 and intfloat/multilingual-e5-large-instruct.
- 35 Hugging Face PyTorch models with verified deployment settings such as stabilityai/stable-diffusion-2-1.
For more information, see the Hugging Face model deployment in the console.
Launched Hex-LLM for high-efficiency large language model serving. This performant TPU serving solution is based on XLA and optimized kernels to achieve high throughput and low latency.
Hex-LLM uses several parallelism strategies for multiple TPU chips, quantizations, dynamic LoRA, and more. Hex-LLM supports the following dense and sparse LLMs:
- Gemma 2B and 7B
- Gemma 2 9B and 27B
- Llama 2 7B, 13B and 70B
- Llama 3 8B and 70B
- Mistral 7B and Mixtral 8x7B
- Updated Docker images in Llama 3 notebooks that are more efficient at tuning.
- A notebook-based interactive workshop UI was added in Model Garden for image generative models such as stable-diffusion-xl-base, image inpainting, controlnet. You can find these models from the Open Notebook list.
- Colab Notebooks for frequently used models in Model Garden have been revised with no-code or low-code implementations to improve accessibility and user experience.
June 27, 2024
Context caching is available for Gemini 1.5 Pro. Use context caching to reduce the cost of requests that contain repeat content with high input token counts. For more information, see Context caching overview.
June 25, 2024
Controlled generation is available on Gemini 1.5 Pro and supports the JSON schema. For more information, see Control generated output.
June 20, 2024
The Anthropic Claude Sonnet 3.5 is Generally Available. To learn more, view the Claude Sonnet 3.5 model card in Model Garden.
June 17, 2024
Increased the input token limit for Gemini 1.5 Pro from 1M to 2M. For more information, see Google models.
June 11, 2024
Upload media from Google Drive
You can upload media, such as PDF, MP4, WAV, and JPG files from Google Drive, when you send image, video, audio, and document prompt requests.
June 10, 2024
Experiment in the Vertex AI Studio login-free
The Vertex AI Studio multi-model prompt designer can be accessed login-free. With this feature, prospective customers can use the Vertex AI Studio to test queries before deciding to sign up and create an account. To learn more about this experience, see Vertex AI Studio console experiences or to access the console directly go to Vertex AI Studio.
May 31, 2024
Anthropic Claude 3.0 Opus model
The Anthropic Claude 3.0 Opus model is Generally Available. To learn more, see its model card in Model Garden.
Generative AI on Vertex AI Regional APIs
Generative AI on Vertex AI regional APIs are available in the following three regions:
us-east5
me-central1
me-central2
May 28, 2024
Gemini models support the frequencyPenalty
and presencePenalty
parameters. Use frequencyPenalty
to control the probability of repeated text in a response. Use presencePenalty
to control the probability of generating more diverse content. For more information, see Gemini model parameters.
May 24, 2024
The Gemini 1.5 Pro (gemini-1.5-pro-001
) and Gemini 1.5 Flash (gemini-1.5-flash-001
) models are Generally Available. For more information, see Google models, Overview of the Gemini API, and Send multimodal prompt requests.
May 20, 2024
The following models have been added to Model Garden:
- E5: A text embedding model series that can be served with a GPU or CPU.
- Instant ID: An identity preserving text-to-image generation model.
- Stable Diffusion XL lightning: A text-to-image generation model that is based on SDXL but requires fewer inference iterations.
To see a list of all available models, see Explore models in Model Garden.
May 14, 2024
Gemini 1.5 Flash (Preview)
Gemini 1.5 Flash (gemini-1.5-flash-preview-0514
) is available in Preview. Gemini 1.5 Flash is a multimodal model designed for fast, high volume, cost-effective text generation and chat applications. It can analyze text, code, audio, PDF, video, and video with audio.
Grounding Gemini with Google Search is GA
The Gemini API Grounding with Google Search feature is available in GA. This is available for Gemini 1.0 Pro models. To learn more about model grounding, see Grounding with Google Search.
Batch prediction support for Gemini
Batch prediction is available for Gemini in preview. Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, see Get batch predictions for Gemini.
PaliGemma model
The PaliGemma model is available. PaliGemma is a lightweight open model that's part of the Google Gemma model family. It's the Gemma model family's best model option for image captioning tasks and visual question and answering tasks. Gemma models are based on Gemini models and intended to be extended by customers.
New stable text embedding models
The following text embedding models are available GA:
text-embedding-004
text-multilingual-embedding-002
For details on how to use these models, see Get text embeddings.
April 18, 2024
Meta's open weight Llama 3 model is available in the Vertex AI Model Garden.
April 11, 2024
Anthropic Claude 3.0 Opus model
The Anthropic Claude 3.0 Opus model is available in Preview. The Claude 3.0 Opus model is an Anthropic partner model that you can use with Vertex AI. It's the most capable of the Anthropic models at performing complex tasks quickly. To learn more, see its model card in Model Garden.
April 09, 2024
New Imagen on Vertex AI image generation model and features
The 006 version of the Imagen 2 image generation model (imagegeneration@006
) is now available. This model offers the following additional features:
- Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
- Digital watermark (SynthID) enabled by default
- Watermark verification*
- New user-configurable safety features (safety setting, person/face setting)
For more information, see Model versions and Generate images using text prompts.
* The seed
field can't be used while digital watermark is enabled.
New Imagen on Vertex AI image editing model and features
The 006 version of the Imagen 2 image editing model (imagegeneration@006
) is now available. This model offers the following additional features:
- Inpainting - Add or remove content from a masked area of an image
- Outpainting - Expand a masked area of an image
- Product image editing - Identify and maintain a primary product while changing the background or product position
For more information, see Model versions.
Change in Imagen image generation version 006 (imagegeneration@006
) seed
field behavior
For the new Imagen image generation model version 006 (imagegeneration@006
) the seed
field behavior has changed. For the v.006 model a digital watermark is enabled by default for image generation. To be able to use a seed
value to get deterministic output you must disable digital watermark generation by setting the following parameter
: "addWatermark": false
.
For more information, see the Imagen for image generation and editing API reference.
CodeGemma model
The CodeGemma model is available. CodeGemma is a lightweight open model that's part of the Google Gemma model family. CodeGemma is the Gemma model family's code generation and code completion offering. Gemma models are based on Gemini models and intended to be extended by customers.
Grounding Gemini and Grounding with Google Search
The Gemini API now supports Grounding with Google Search in Preview. Currently available for Gemini 1.0 Pro models.
Regional APIs
- Regional APIs are available in 11 new countries for Gemini, Imagen, and embeddings.
- US and EU have machine-learning processing boundaries for the
gemini-1.0-pro-001
,gemini-1.0-pro-002
,gemini-1.0-pro-vision-001
, andimagegeneration@005
models.
Generative AI on Vertex AI security control update
Security controls are available for the online prediction feature for Gemini 1.0 Pro and Gemini 1.0 Pro Vision.
Gemini 1.5 Pro (Preview)
Gemini 1.5 Pro (gemini-1.5-pro-preview-0409
) is available in Preview. Gemini 1.5 Pro is a multimodal model that analyzes text, code, audio, PDF, video, and video with audio.
New text embedding models
The following text embedding models are now in Preview.
text-embedding-preview-0409
text-multilingual-embedding-preview-0409
When evaluated using the MTEB benchmarks, these models produce better embeddings compared to previous versions. The new models also offer dynamic embedding sizes, which you can use to output smaller embedding dimensions, with minor performance loss, to save on computing and storage costs.
For details on how to use these models, refer to the public documentation and try out our Colab.
System instructions
System instructions are supported in Preview by the Gemini 1.0 Pro (stable version gemini-1.0-pro-002
only) and Gemini 1.5 Pro (Preview) multimodal models. Use system instructions to guide model behavior based on your specific needs and use cases. For more information, see System instructions examples.
Supervised Tuning for Gemini
Supervised tuning is available for the gemini-1.0-pro-002 model
.
Online Evaluation Service
Generative AI evaluation supports online evaluation in addition to pipeline evaluation. The list of supported evaluation metrics has also expanded. See API reference and SDK reference.
Generative AI Knowledge Base
The Jump Start Solution: Generative AI Knowledge Base demonstrates how to build a simple chatbot with business- and domain-specific knowledge.
Text translation
Translate text in Vertex AI Studio is available in Preview.
Gemini 1.0 Pro stable version 002
The 002 version of the Gemini 1.0 Pro multimodal model (gemini-1.0-pro-002
) is available. For more information about stable versions of Gemini models, see Gemini model versions and lifecycle.
Vertex AI Studio features and updates
- The Vertex AI Studio supports side-by-side comparison to allow users to compare up to 3 prompts in a side-by-side view.
- The Vertex AI Studio supports rapid evaluation in console and the ability to upload a ground truth response (or a model response to try to emulate).
To learn more, see Try your prompts in Vertex AI Studio
April 02, 2024
Model Garden supports all Text Generation Inference supported models in HuggingFace:
- Verified deployment settings for about 400 Hugging Face text generation models (including google/gemma-7b-it, meta-llama/Llama-2-7b-chat-hf, and mistralai/Mistral-7B-v0.1).
- Other Hugging Face text generation models have unverified deployment settings that are auto generated.
March 29, 2024
The MedLM-large model infrastructure has been upgraded to improve latency and stability. Responses from the model might be slightly different.