This page explains what Provisioned Throughput is and when to use Provisioned Throughput. Provisioned Throughput is a fixed-cost, fixed-term subscription
available in several term-lengths that reserves throughput for
supported generative AI models on Vertex AI.
To reserve your throughput, you must specify the model and available
locations in which the model
runs. If any of the following considerations apply to your use case, consider using
Provisioned Throughput: Provisioned Throughput is one of two ways to consume your
generative AI models. The second way is pay-as-you-go, which is also referred to
as on-demand.Introduction to Provisioned Throughput
When to use Provisioned Throughput
What's next
Provisioned Throughput overview
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-18 UTC.