Quotas

This document lists the quotas that apply to Cloud TPU. For information about Cloud TPU pricing, see Cloud TPU pricing.

A quota restricts how much of a shared Google Cloud resource your Google Cloud project can use, including hardware, software, and network components. Therefore, quotas are a part of a system that does the following:

  • Monitors your use or consumption of Google Cloud products and services.
  • Restricts your consumption of those resources, for reasons that include ensuring fairness and reducing spikes in usage.
  • Maintains configurations that automatically enforce prescribed restrictions.
  • Provides a means to request or make changes to the quota.

In most cases, when a quota is exceeded, the system immediately blocks access to the relevant Google resource, and the task that you're trying to perform fails. In most cases, quotas apply to each Google Cloud project and are shared across all applications and IP addresses that use that Google Cloud project.

Quota allocation

Quota is granted differently based on the TPU version you are using.

TPU v4 and v5p

For TPU v4 and v5p, quota can be specified in terms of Cloud TPU chips or TensorCores. All v4 and v5p TPUs are treated as slices, so there is no concept of a single TPU device as there is with v2 and v3 TPUs. You can use your quota in any combination of slices. For example, if you have quota for a v4-32 slice, you can use this quota to create four v4-8 slices.

TPU v5e (training and inference)

v5e supports both training and inference. Creating a v5e instance for inference (v5litepod-1, v5litepod-4, v5litepod-8) requires serving quota types: tpu-v5s-litepod-serving for on-demand TPUs, tpu-v5s-litepod-serving-preemptible for preemptible TPUs, and tpu-v5s-litepod-serving-reserved for reserved TPUs.

TPU v2 and v3

For TPU v2 and v3, quota is specified in terms of TensorCores. A single Cloud TPU device comprises four TPU chips and eight TensorCores, two TensorCores per TPU chip. TPU v2 and v3 have separate quotas for single devices and for TPU Pods. You cannot use a v2 or v3 TPU Pod quota for v2-8 or v3-8 TPUs. For example, if you have quota for a v3-32 slice, you cannot use it to create four v3-8 TPUs.

For more information about TPU chips and TensorCores, see TPU System architecture.

Quota types

There are separate quotas for reserved, on-demand, and preemptible Cloud TPU resources. The following table compares the features of each type of quota.

Quota type Description Default value How to request Flags for TPU creation
Reserved Quota for reserved TPUs. A reservation provides a high level of assurance in obtaining Cloud TPU capacity. Reserved instances are protected from stockouts but are subject to interruptions. You must have a committed use discount (CUD) to access reserved resources. 0 To request a reservation, fill out the Cloud TPU sign up form. Use the --reserved flag.
On-demand Quota for TPUs that are not reserved and won't be preempted. You can request up to your quota limit of Cloud TPU resources, but availability of resources is not guaranteed. v3-8 and v2-8: 16 TensorCores
All others: 0
See Request additional quota. No flags needed, selected by default.
Preemptible Quota for preemptible TPUs. The Cloud TPU service might shut down these TPUs at any time if it requires additional resources for higher priority jobs. Availability of resources is not guaranteed. For more information, see Preemptible TPUs. v3-8 and v2-8: 48 TensorCores
All others: 0
See Request additional quota. Use the --preemptible flag or the --best-effort flag for a queued resource request.

View and request additional quota

You can view the quota allocated for your Google Cloud project on the Quotas page in the Google Cloud console. If you need additional Cloud TPU quota, you can request it from the Quotas page. For more information, see Request a higher quota limit.