Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Provisioned Throughput overview

This page explains what Provisioned Throughput is and when to use Provisioned Throughput.

Introduction to Provisioned Throughput

Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term-lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve your throughput, you must specify the model and available locations in which the model runs.

When to use Provisioned Throughput

If any of the following considerations apply to your use case, consider using Provisioned Throughput:

You are building real-time generative AI production applications, such as chatbots and agents.
Your critical workloads consistently require high throughput. Throughput measurement depends on the model.
You want to provide a consistent and predictable experience for users of your applications.
You want deterministic generative AI costs by paying a fixed monthly or weekly price with control of overages.

Provisioned Throughput is one of two ways to consume your generative AI models. The second way is pay-as-you-go, which is also referred to as on-demand.

What's next

Supported models using Provisioned Throughput.

Provisioned Throughput overview Stay organized with collections Save and categorize content based on your preferences.

Introduction to Provisioned Throughput

When to use Provisioned Throughput

What's next

Provisioned Throughput overview