Quotas
This document lists the quotas that apply to Cloud TPU. For information about Cloud TPU pricing, see Cloud TPU pricing.
A quota restricts how much of a particular shared Google Cloud resource your Google Cloud project can use, including hardware, software, and network components.
Quotas are part of a system that does the following:
- Monitors your use or consumption of Google Cloud products and services.
- Restricts your consumption of those resources for reasons including ensuring fairness and reducing spikes in usage.
- Maintains configurations that automatically enforce prescribed restrictions.
- Provides a means to make or request changes to the quota.
When a quota is exceeded, in most cases, the system immediately blocks access to the relevant Google resource, and the task that you're trying to perform fails. In most cases, quotas apply to each Google Cloud project and are shared across all applications and IP addresses that use that Google Cloud project.
Quota allocation
Quota is granted differently based on the TPU version you are using. For TPU v4 or later, quota is specified in terms of Cloud TPU chips or TensorCores. All TPU v4s are treated as slices, so there is no concept of a single TPU device. You can use your v4 quota in any combination of slices. For example, if you have quota for a v4-32 slice, you can use this quota to create four v4-8 slices.
For TPU v2 and v3, quota is specified in terms of TensorCores. A single Cloud TPU device comprises four TPU chips and eight TensorCores, two TensorCores per TPU chip. TPU v2 and v3 have separate quotas for single devices and for TPU Pods. You cannot use a v2 or v3 TPU Pod quota for v2-8 or v3-8 TPUs. For example, if you have quota for a v3-32 slice, you cannot use it to create four v3-8 TPUs.
For more information about TPU chips and TensorCores, see TPU System architecture.
Quota types
There are separate quotas for reserved, on-demand, and preemptible Cloud TPU resources. The following table compares the features of each type of quota.
Quota type | Description | Default value | How to request | Flags for TPU creation |
---|---|---|---|---|
Reserved | Quota for reserved TPUs. A reservation provides a high level of assurance in obtaining Cloud TPU capacity. Reserved instances are protected from stockouts but are subject to interruptions. You must have a committed use discount (CUD) to access reserved resources. | 0 | To request a reservation, fill out the Cloud TPU sign up form. |
Use the
--reserved flag.
|
On-demand | Quota for TPUs that are not reserved and will not be preempted. You can request up to your quota limit of Cloud TPU resources, but availability of resources is not guaranteed. |
v3-8 and v2-8: 16 TensorCores All others: 0 |
See Request additional quota. | No flags needed, selected by default. |
Preemptible | Quota for preemptible TPUs. The Cloud TPU service might shut down these TPUs at any time if it requires additional resources for higher priority jobs. Availability of resources is not guaranteed. For more information, see Preemptible TPUs. |
v3-8 and v2-8: 48 TensorCores All others: 0 |
See Request additional quota. |
Use the
--preemptible flag or the
--best-effort flag for a
queued resource request. |
View and request additional quota
You can view the quota allocated for your Google Cloud project on the Quotas page in the Google Cloud console. If you need additional Cloud TPU quota, you can request it from the Quotas page. For more information, see Request a higher quota limit.