This guide answers frequently asked questions about Gemini models, monitoring, and billing, covering the following topics: Last updated: June 25, 2025
If your application recently started showing errors related to an unavailable
Palm2, Gemini 1.0, or Gemini 1.5-001 model, this section
explains how you can transition to a supported model.
Google regularly releases new and improved AI models. To make way for these
advancements, older models are retired (or deprecated). We provide
notice when deprecating a model and a transition window before access to the
model is terminated, but we understand it can still cause interruptions.
Here are two options for updating your model: The Gemini 2 models include the following upgrades over the 1.5 models: The following table shows a comparison between the Gemini 2 models:
To see all benchmark capabilities for Gemini 2, visit the
Google DeepMind
documentation.
Migrating to Google Cloud's Vertex AI platform offers a suite of
MLOps tools that streamline the usage, deployment, and monitoring of AI
models for efficiency and reliability. To migrate your work to
Vertex AI, import and upload your existing data to Vertex AI Studio and use the Gemini API with Vertex AI.
For more information, see Migrate from
Gemini on Google AI to Vertex AI.
While the experimental version of Gemini 2.0 Flash
supports image generation, Gemini 2 does not currently support
image generation in the generally available models. The experimental version
of Gemini 2.0 Flash is not recommended for production-level
code.
If you need image generation in production code, use
Imagen 3. This powerful model offers
high-quality images, low-latency generation, and flexible editing options.
Compositional function calling
is only available in Google AI Studio.
For the full list of locations that are supported for Gemini 2
models, see Locations.
Gemini models on Vertex AI use a Dynamic Shared
Quota (DSQ) system. This approach automatically manages
capacity across all users in a region, which provides optimal performance without
the need for manual quota adjustments or requests. As a result, you won't
see traditional quota usage displayed in the Quotas & System Limits
tab. Your project automatically receives the necessary resources based
on real-time availability.
To monitor usage, use the Vertex AI Model Garden (Monitoring)
dashboard.
For generative AI applications in production that require consistent
throughput, we recommend using Provisioned Throughput (PT). PT provides a
predictable and consistent user experience, which is critical for time-sensitive
workloads. Additionally, it provides deterministic monthly or weekly cost
structures, enabling accurate budget planning.
For more information, see Provisioned Throughput overview.
The list of models supported for Provisioned Throughput, including
throughput, purchase increment, and burndown rate, is available on the
Supported
models page.
To purchase Provisioned Throughput for partner models (such as
Anthropic's Claude models), you must contact Google; you can't order through
the Google Cloud console. For more information, see
Partner models.
There are three ways to measure your Provisioned Throughput usage:
When using the built-in monitoring metrics or HTTP response headers, you
can create a chart in the
Metrics Explorer to monitor usage.
To buy, manage, and use Provisioned Throughput, you need the same permissions as pay-as-you-go usage. For detailed instructions, see Purchase Provisioned Throughput.
If you have issues placing an order, you might need one of the following roles:
A generative AI scale unit (GSU) is an abstract measure of capacity for throughput provisioning
that is fixed and standard across all Google models that support Provisioned
Throughput. A GSU has a fixed price and capacity, but the throughput can vary
between models. This is because different models might require different
amounts of capacity to deliver the same throughput.
To estimate your Provisioned Throughput needs, follow these steps:
You are invoiced at the end of each month for the Provisioned Throughput
charges that you incurred during that month.
While a direct test environment is not available, a 1-week order with a
limited number of GSUs provides a cost-effective way to experience its
benefits and assess its suitability for your requirements.
For more information, see
Purchase Provisioned Throughput.
Gemini 2 general FAQ
What should I do if the model I'm using is no longer available?
How do the Gemini 2 models compare to the 1.5 generation?
Model name
Description
Upgrade path for
Gemini 2.5 Pro
Strongest model quality (especially for code and world knowledge), with a 1M token-long context window
Gemini 1.5 Pro users who want better quality, or who are particularly invested in long context and code
Gemini 2.0 Flash
Workhorse model for all daily tasks and features enhanced performance and supports real-time Live API
Gemini 2.0 Flash-Lite
Our most cost effective offering to support high throughput
How do I migrate Gemini on Google AI Studio to Vertex AI Studio?
How does Gemini 2 image generation compare to Imagen 3?
Does Gemini 2 in Vertex AI support compositional function calling?
What locations are supported for Gemini 2?
What are the default quotas for Gemini 2?
Monitoring
Why does the API dashboard show 0% quota usage?
Provisioned Throughput
When should I use Provisioned Throughput?
What models are supported for Provisioned Throughput?
How can I monitor my Provisioned Throughput usage?
What permissions are required to purchase and use Provisioned Throughput?
What is a GSU?
How can I estimate my GSU needs for Provisioned Throughput?
How often am I billed for Provisioned Throughput?
How long does it take to activate my Provisioned Throughput order?
Can I test Provisioned Throughput before placing an order?
Frequently asked questions
Stay organized with collections
Save and categorize content based on your preferences.
$$
\begin{aligned}
\text{Throughput per sec} = & \\
& \qquad (\text{Inputs per query converted to input chars} \\
& \qquad + \text{Outputs per query converted to input chars}) \\
& \qquad \times \text{QPS}
\end{aligned}
$$