Provisioned Throughput overview

This page explains what Provisioned Throughput is and when to use Provisioned Throughput.

Introduction to Provisioned Throughput

Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term-lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve your throughput, you must specify the model and available locations in which the model runs.

When to use Provisioned Throughput

If any of the following considerations apply to your use case, consider using Provisioned Throughput:

You are building real-time generative AI production applications, such as chatbots and agents.
Your critical workloads consistently require high throughput. Throughput measurement depends on the model.
You want to provide a consistent and predictable experience for users of your applications.
You want deterministic generative AI costs by paying a fixed monthly or weekly price with control of overages.

Provisioned Throughput is one of two ways to consume your generative AI models. The second way is pay-as-you-go, which is also referred to as on-demand.

What's next

Supported models using Provisioned Throughput.