Purchase Provisioned Throughput

This page provides details to consider before subscribing to Provisioned Throughput, the permissions you must have to place or to view a Provisioned Throughput order, and the instructions for placing and viewing your orders.

What to consider before subscribing

To help you decide whether you want to subscribe to Provisioned Throughput, review this list of details about the subscription:

  • You can't cancel your order.

    Your Provisioned Throughput purchase is a commitment, which means that you can't cancel the order. However, you can increase the number of purchased GSUs. If you accidentally purchase a commitment or there's a problem with your configuration, contact your Google Cloud account representative for assistance.

  • You can auto-renew your subscription.

    When you submit your order, you can choose to auto-renew your subscription at the end of its term, or let the subscription expire. You can cancel the auto-renew process. To cancel your subscription before it auto renews, cancel the auto renewal 30 days prior to the start of the next term.

    You can configure monthly subscriptions to renew automatically each month. Weekly terms don't support automatic renewal.

    If you need assistance with this process, contact your Google Cloud account representative.

  • You can change your model version or region with notice.

    Provisioned Throughput is enabled after you've chosen your project, region, model, and version. You can change your model version within the same model publisher or region with a 10-business-day notice by contacting your Google Cloud account representative for assistance. For example, you can switch between Google's models. You can switch between partner A's models. You can switch between partner B's models. You can't switch between Google, partner A, and partner B's models.

  • There is no downtime when you switch to Provisioned Throughput from pay-as-you-go.

    There is no downtime when you switch between models for a Provisioned Throughput order. However, the lead time to acquire throughput is required.

  • By default, the overage is billed as pay-as-you-go.

    If your throughput exceeds your Provisioned Throughput order amount, overages are processed and billed as pay-as-you-go. You can control overages on a per-request basis. For more information, see Use the REST API.

  • Requests are prioritized.

    Requests from Provisioned Throughput customers are prioritized and serviced first before on-demand requests.

  • You must commit to a minimum usage and payment.

    Minimum usage is dependent on the generative AI model that you select. Any usage beyond the purchased throughput rate isn't assured and is serviced on a reasonable-efforts basis.

  • Throughput doesn't accumulate.

    Any unused throughput doesn't accumulate or carry over to the next month.

  • Provisioned Throughput is measured on characters or tokens per second.

    Provisioned Throughput is measured on characters or tokens per second, not on queries per minute (QPM). As a result, measuring Provisioned Throughput depends on your use case's query size and QPM.

  • Provisioned Throughput checks your quota.

    Your Provisioned Throughput quota is checked each time you make a request within your quota window. For gemini-1.5-flash-002 and gemini-1.5-pro-002 models, the quota window is 30 seconds. This means that you might temporarily experience prioritized traffic that exceeds your quota amount on a per-second basis in some cases, but you shouldn't exceed your quota on a 30-second basis. The quota window for other models is one minute.

  • Supervised fine-tuned model endpoints and their corresponding base model count towards the same Provisioned Throughput quota. This is a Preview feature. Fill out and submit the Provisioned Throughput access control form.

    For example, Provisioned Throughput purchased for gemini-1.5-pro-002 for a specific project prioritizes requests made from supervised fine-tuned versions of gemini-1.5-pro-002 created within that project. Use the appropriate header to control traffic behavior.

Permissions

To subscribe to Provisioned Throughput, you must have one of the following permissions assigned to your project, which lets you list and place new orders.

  • aiplatform.googleapis.com/provisionedThroughputAdmin: Specific to Provisioned Throughput.
  • aiplatform.googleapis.com/admin: Gives administrative rights to every resource in Vertex AI.

This role lets you only list your orders:

  • aiplatform.googleapis.com/viewer

Place a Provisioned Throughput order

Before you place your order to use Imagen models, submit the Request to grant permissions form to be granted permissions.

Before you place an order to use MedLM-large-1.5, contact your Google Cloud account representative to request access. If you expect your QPM to exceed 30,000, then to maximize your Provisioned Throughput order, request an increase to your default Vertex AI system quota using the following information:

  • Service: The Vertex AI API.
  • Name: Online prediction requests per minute per region
  • Service type: A quota.
  • Dimensions: The region where you ordered Provisioned Throughput.
  • Value: This is your chosen online-prediction traffic limit.

Follow these steps to purchase Provisioned Throughput:

Console

  1. In the Google Cloud console, go to the Provisioned Throughput page.

    Go to Provisioned Throughput

  2. To start a new order, click Create.
  3. Enter an Order name.
  4. Select the Model.
  5. Select the Region.
  6. Enter the Number of generative AI scale units (GSUs) that you must purchase. If you must estimate the number of GSUs, click the Estimation tool.
    1. Select your Model.
    2. Enter the number of Queries per second.
    3. Enter the number of Input characters per query.
    4. Enter the number of Input images per query.
    5. Enter the number of Video seconds per query.
    6. Enter the number of Audio seconds per query.
    7. Enter the number of Output characters per query.
    8. If you want to use the values that you entered into the estimation tool, click Use calculated.
  7. Select your Term.

    If you choose one week, you have the option to provide a start date and time within two weeks into the future of placing an order. If you provide no start date and time, we process the order as soon as we can ensure that the capacity is available. Requested start dates and times are processed on a best-effort basis, and orders aren't guaranteed to be fulfilled by these dates until the order status is set to Approved.

    If your requested start date is too close to the current date, your order might be approved and activated after your requested start date, which means that your end date remains seven days from the activation date.

  8. Select your Renewal option.
  9. Click Continue.
  10. In the Summary section, review the price and throughput estimates for your order. Read the terms listed and linked in the form.
  11. To finalize your order, click Confirm.

Check order status

After you submit your Provisioned Throughput order, the order status might appear as one of the following:

  • Pending review: You placed your order. Because approval depends on available capacity to provision your order, your order is waiting for review and approval. For more information about the status of your pending order, contact your Google Cloud account representative.
  • Approved: Google has approved your order.
  • Active: Google has activated your order, and then billing starts.
  • Expired: Your order has expired.

View Provisioned Throughput orders

Follow these steps to view your Provisioned Throughput orders:

Console

  1. In the Google Cloud console, go to the Provisioned Throughput page.

    Go to Provisioned Throughput

  2. Select the Region. Your list of orders appears.

What's next