Provisioned Throughput overview

This page explains what Provisioned Throughput is and when to use Provisioned Throughput.

Provisioned Throughput is a fixed-cost monthly subscription or weekly service that reserves throughput for supported generative AI models on Vertex AI. To reserve your throughput, you must specify the model and available locations in which the model runs.

When to use Provisioned Throughput

If any of the following considerations apply to your use case, consider using Provisioned Throughput:

Your critical workloads consistently require high throughput. Throughput measurement depends on the model.
You are building real-time generative AI production applications, such as chatbots and agents.
Your throughput needs exceed 20,000 characters per second.
You want to provide a consistent and predictable experience for users of your applications.
You want deterministic generative AI costs by paying a fixed-monthly price with control of overages.
You want deterministic generative AI costs by paying a fixed monthly or weekly price with control of overages.

Provisioned Throughput is one of two ways to consume your generative AI models. The second way is pay-as-you-go, which is also referred to as on-demand.

What's next

Supported models using Provisioned Throughput.
Overview of Dynamic shared quota.