Frequently asked questions

This guide answers frequently asked questions about Gemini models, monitoring, and billing, covering the following topics:

  • Gemini 2 general FAQ: Answers common questions about Gemini 2 models, including model comparisons, migration, and quotas.
  • Monitoring: Explains how to monitor quota and model usage.
  • Provisioned Throughput: Provides details on when to use, how to monitor, and how to purchase Provisioned Throughput.

Last updated: June 25, 2025

Gemini 2 general FAQ

What should I do if the model I'm using is no longer available?

If your application recently started showing errors related to an unavailable Palm2, Gemini 1.0, or Gemini 1.5-001 model, this section explains how you can transition to a supported model.

Google regularly releases new and improved AI models. To make way for these advancements, older models are retired (or deprecated). We provide notice when deprecating a model and a transition window before access to the model is terminated, but we understand it can still cause interruptions.

Here are two options for updating your model:

  • Switch to a supported model: To resolve the errors, update your application code to point to a newer, supported model. Before you launch the change, test all critical features to verify that everything works as expected.
  • Follow the step-by-step migration process: If you have more time, this guided process helps you upgrade to the latest Gemini SDK and adopt best practices. This approach minimizes risks and helps you use the new model optimally. For more information, see the migration guide.

How do the Gemini 2 models compare to the 1.5 generation?

The Gemini 2 models include the following upgrades over the 1.5 models:

  • Improved multilingual capabilities: Gemini 2 models show strong advancements in multilingual understanding, with increased scores in the Global MMLU (Lite) benchmark.
  • Significant gains in reasoning and knowledge factuality: Gemini 2.5 Pro shows substantial improvements in GPQA (domain expert reasoning) and SimpleQA (world knowledge factuality), which indicates an enhanced ability to understand and provide accurate information.
  • Enhanced mathematical problem-solving: Both Gemini 2.0 Flash and Gemini 2.5 Pro demonstrate notable progress in handling complex mathematical problems, as evidenced by the MATH and HiddenMath benchmarks.

The following table shows a comparison between the Gemini 2 models:

Model name Description Upgrade path for
Gemini 2.5 Pro Strongest model quality (especially for code and world knowledge), with a 1M token-long context window Gemini 1.5 Pro users who want better quality, or who are particularly invested in long context and code
Gemini 2.0 Flash Workhorse model for all daily tasks and features enhanced performance and supports real-time Live API
  • Gemini 1.5 Flash users who want a model with significantly better quality that's slightly slower
  • Gemini 1.5 Pro users who want slightly better quality and real-time latency
Gemini 2.0 Flash-Lite Our most cost effective offering to support high throughput
  • Gemini 1.5 Flash users who want better quality for the same price and speed
  • Customers looking for the fastest model in the Gemini 2 family

To see all benchmark capabilities for Gemini 2, visit the Google DeepMind documentation.

How do I migrate Gemini on Google AI Studio to Vertex AI Studio?

Migrating to Google Cloud's Vertex AI platform offers a suite of MLOps tools that streamline the usage, deployment, and monitoring of AI models for efficiency and reliability. To migrate your work to Vertex AI, import and upload your existing data to Vertex AI Studio and use the Gemini API with Vertex AI.

For more information, see Migrate from Gemini on Google AI to Vertex AI.

How does Gemini 2 image generation compare to Imagen 3?

While the experimental version of Gemini 2.0 Flash supports image generation, Gemini 2 does not currently support image generation in the generally available models. The experimental version of Gemini 2.0 Flash is not recommended for production-level code.

If you need image generation in production code, use Imagen 3. This powerful model offers high-quality images, low-latency generation, and flexible editing options.

Does Gemini 2 in Vertex AI support compositional function calling?

Compositional function calling is only available in Google AI Studio.

What locations are supported for Gemini 2?

For the full list of locations that are supported for Gemini 2 models, see Locations.

What are the default quotas for Gemini 2?

  • Gemini 2.0 Flash and Gemini 2.0 Flash-Lite: Use dynamic shared quota and have no default quota.
  • Gemini 2.5 Pro: This is an experimental model and has a 10 queries per minute (QPM) limit.

Monitoring

Why does the API dashboard show 0% quota usage?

Gemini models on Vertex AI use a Dynamic Shared Quota (DSQ) system. This approach automatically manages capacity across all users in a region, which provides optimal performance without the need for manual quota adjustments or requests. As a result, you won't see traditional quota usage displayed in the Quotas & System Limits tab. Your project automatically receives the necessary resources based on real-time availability.

To monitor usage, use the Vertex AI Model Garden (Monitoring) dashboard.

Provisioned Throughput

When should I use Provisioned Throughput?

For generative AI applications in production that require consistent throughput, we recommend using Provisioned Throughput (PT). PT provides a predictable and consistent user experience, which is critical for time-sensitive workloads. Additionally, it provides deterministic monthly or weekly cost structures, enabling accurate budget planning.

For more information, see Provisioned Throughput overview.

What models are supported for Provisioned Throughput?

The list of models supported for Provisioned Throughput, including throughput, purchase increment, and burndown rate, is available on the Supported models page.

To purchase Provisioned Throughput for partner models (such as Anthropic's Claude models), you must contact Google; you can't order through the Google Cloud console. For more information, see Partner models.

How can I monitor my Provisioned Throughput usage?

There are three ways to measure your Provisioned Throughput usage:

  • Model Garden monitoring dashboard: A pre-built dashboard in the Google Cloud console that provides a quick, high-level visual overview of usage. For more information, see the Model Garden monitoring dashboard.
  • Built-in monitoring metrics: Programmatic access to metrics that you can use to create custom, detailed monitoring charts and alerts in Metrics Explorer. For more information, see monitoring metrics.
  • HTTP response headers: Real-time, per-request usage data returned in the headers of each API response. For more information, see HTTP response headers.

When using the built-in monitoring metrics or HTTP response headers, you can create a chart in the Metrics Explorer to monitor usage.

What permissions are required to purchase and use Provisioned Throughput?

To buy, manage, and use Provisioned Throughput, you need the same permissions as pay-as-you-go usage. For detailed instructions, see Purchase Provisioned Throughput.

If you have issues placing an order, you might need one of the following roles:

  • Vertex AI Administrator
  • Vertex AI Platform Provisioned Throughput Admin

What is a GSU?

A generative AI scale unit (GSU) is an abstract measure of capacity for throughput provisioning that is fixed and standard across all Google models that support Provisioned Throughput. A GSU has a fixed price and capacity, but the throughput can vary between models. This is because different models might require different amounts of capacity to deliver the same throughput.

How can I estimate my GSU needs for Provisioned Throughput?

To estimate your Provisioned Throughput needs, follow these steps:

  1. Gather your requirements: Determine your expected usage patterns, including inputs per query, outputs per query, and queries per second (QPS).
  2. Calculate your throughput: Use the following formula to calculate your required throughput per second.
    $$ \begin{aligned} \text{Throughput per sec} = & \\ & \qquad (\text{Inputs per query converted to input chars} \\ & \qquad + \text{Outputs per query converted to input chars}) \\ & \qquad \times \text{QPS} \end{aligned} $$
  3. Calculate your GSUs: Use the estimation tool in the purchasing console to calculate the number of GSUs needed to cover your usage for the selected model.

How often am I billed for Provisioned Throughput?

You are invoiced at the end of each month for the Provisioned Throughput charges that you incurred during that month.

How long does it take to activate my Provisioned Throughput order?

  • For small orders or small incremental increases, the order is auto-approved and ready within minutes if capacity is available.
  • Larger increases or orders may take longer and may require us to communicate with you directly in order to prepare capacity for your order. We aim to have a decision on each request (either approved or denied) within one week of order submission.

Can I test Provisioned Throughput before placing an order?

While a direct test environment is not available, a 1-week order with a limited number of GSUs provides a cost-effective way to experience its benefits and assess its suitability for your requirements.

For more information, see Purchase Provisioned Throughput.