The Gemini 2.0 models are the latest Google models supported in Vertex AI. This page goes over the following models:
If you're looking for information on our Gemini 2.0 Flash Thinking model, visit our Gemini 2.0 Flash Thinking documentation.
This page also includes information on our new Gen AI SDK, a new SDK that supports migration between the Gemini Developer API and the Gemini API on Vertex AI.
2.0 models
2.0 Flash
Gemini 2.0 Flash is our latest generally available model in the Gemini family. It's our workhorse model for all daily tasks and features enhanced performance and supports real-time Live API. 2.0 Flash is an upgrade path for 1.5 Flash users who want a slightly slower model with significantly better quality, or 1.5 Pro users who want slightly better quality and real-time latency for less.
Gemini 2.0 Flash introduces the following new and enhanced features:
- Multimodal Live API: This new API enables low-latency bidirectional voice and video interactions with Gemini.
- Quality: Enhanced performance across most quality benchmarks than Gemini 1.5 Pro.
- Improved agentic capabilities: 2.0 Flash delivers improvements to multimodal understanding, coding, complex instruction following, and function calling. These improvements work together to support better agentic experiences.
- New modalities: 2.0 Flash introduces built-in image generation and controllable text-to-speech capabilities, enabling image editing, localized artwork creation, and expressive storytelling.
Gemini 2.0 Flash features:
- Multimodal input
- Text output (general availability) / multimodal output (private preview)
- Prompt optimizers
- Controlled generation
- Function calling
- Grounding with Google Search
- Code execution
- Count token
Feature availability
The following features are available for Gemini 2.0 Flash:
Feature | Availability level |
---|---|
Text generation | Generally available |
Grounding with Google Search | Generally available |
Gen AI SDK | Generally available |
Multimodal Live API | Public preview |
Bounding box detection | Public preview |
Image generation | Private preview |
Speech generation | Private preview |
- Generally available: This feature is available publicly and supported for use in production-level code.
- Public preview: This feature is available publicly in a reduced capacity. Don't use features that are released as a public preview in production code, because the support level and functionality of that feature can change without warning.
- Private preview: This feature is only available to users listed on an approved allow-list. Don't use features that are released as a private preview in production code, because the support level and functionality of that feature can change without warning.
Pricing
Information on the pricing for Gemini 2.0 Flash is available on our Pricing page.
Quotas and limitations
GA features in Gemini 2.0 Flash uses dynamic shared quota and is subject to token per minute (TPM) rate limiting.
Grounding with Google Search in Gemini 2.0 Flash is subject to rate limiting.
2.0 Flash-Lite
Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Gemini 2.0 Flash-Lite includes:
- Multimodal input, text output
- 1M token input context window
- 8k token output context window
2.0 Flash-Lite does not include the following 2.0 Flash features:
- Multimodal output generation
- Integration with Multimodal Live API
- Thinking mode
- Built-in tool usage
You can use 2.0 Flash-Lite with our new Gen AI SDK:
from google import genai
# Replace the `project` and `location` values with appropriate values for
# your project.
client = genai.Client(
vertexai=True, project='YOUR_CLOUD_PROJECT', location='us-central1'
)
response = client.models.generate_content(
model='gemini-2.0-flash-lite-preview-02-05', contents='How does AI work?'
)
print(response.text)
Quotas and limitations
Gemini 2.0 Flash-Lite is rate limited to 60 queries per minute during Public Preview.
Gemini 2.0 Flash-Lite is only available in the
us-central1
region in Vertex AI.
2.0 Pro
Gemini 2.0 Pro is our strongest model for coding and world knowledge and features a 2M long context window. Gemini 2.0 Pro is available as an experimental model in Vertex AI and is an upgrade path for 1.5 Pro users who want better quality, or who are particularly invested in long context and code.
Gemini 2.0 Pro features:
- Multimodal input
- Text output
- Prompt optimizers
- Controlled generation
- Function calling (excluding compositional function calling)
- Grounding with Google Search
- Code execution
- Count token
You can use Gemini 2.0 Pro with our new Gen AI SDK:
from google import genai
# Replace the `project` and `location` values with appropriate values for
# your project.
client = genai.Client(
vertexai=True, project='YOUR_CLOUD_PROJECT', location='us-central1'
)
response = client.models.generate_content(
model='gemini-2.0-pro-exp-02-05', contents='How does AI work?'
)
print(response.text)
Quotas and limitations
Gemini 2.0 Pro is rate limited to 10 queries per minute (QPM) during Experimental.
Grounding with Google Search in Gemini 2.0 Pro is subject to rate limiting.
Google Gen AI SDK
To support our new 2.0 models, there's an all new SDK that supports migration between the Gemini Developer API and the Gemini API on Vertex AI.
The new Google Gen AI SDK provides a unified interface to Gemini 2.0 through both the Gemini Developer API and the Gemini API on Vertex AI. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.
The Gen AI SDK also supports the Gemini 1.5 models.
The new SDK is generally available in Python. Support for Go is in Preview, and Java and JavaScript support is coming soon.
You can start using the SDK as shown below.
- Install the new SDK:
pip install google-genai
- Import the
genai
library, initialize a client, and generate content:
from google import genai
# Replace the `project` and `location` values with appropriate values for
# your project.
client = genai.Client(
vertexai=True, project='YOUR_CLOUD_PROJECT', location='us-central1',
http_options={'api_version': 'v1'}
)
response = client.models.generate_content(
model='gemini-2.0-flash-001', contents='How does AI work?'
)
print(response.text)
(Optional) Set environment variables
Alternatively, you can initialize the client using environment variables. First set the appropriate values and export the variables:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=YOUR_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True
Then you can initialize the client without any args:
client = genai.Client()