Quotas

This document lists the quotas that apply to Cloud TPU. For information about Cloud TPU pricing, see Cloud TPU pricing.

A quota restricts how much of a shared Google Cloud resource your Google Cloud project can use, including hardware, software, and network components. Therefore, quotas are a part of a system that does the following:

  • Monitors your use or consumption of Google Cloud products and services.
  • Restricts your consumption of those resources, for reasons that include ensuring fairness and reducing spikes in usage.
  • Maintains configurations that automatically enforce prescribed restrictions.
  • Provides a means to request or make changes to the quota.

In most cases, when a quota is exceeded, the system immediately blocks access to the relevant Google resource, and the task that you're trying to perform fails. In most cases, quotas apply to each Google Cloud project and are shared across all applications and IP addresses that use that Google Cloud project.

Quota types

If you are using GKE, see Ensure sufficient quota for more information on GKE quota. When you are Cloud TPU API quota, there are separate quotas for reserved, on-demand, and preemptible Cloud TPU resources. The following table compares each type of quota.

Quota type Description Default value How to request Flags for TPU creation
Reserved The number of Cloud TPU resources for which you have guaranteed access. You must have a reservation agreement to access reserved resources. Reserved resources are protected from stockouts but are subject to interruptions. 0 To request reserved quota, contact your Google Cloud account representative. Use the --reserved flag.
On-demand The number of on-demand resources for which you have access. On-demand resources won't be preempted, but on-demand quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request. v3-8 and v2-8: 16 TensorCores
All others: 0
See Request additional quota. No flags needed, selected by default.
Preemptible The number of preemptible Cloud TPU resources for which you have access. Preemptible resources may be preempted to make room for higher priority jobs. Preemptible quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request. For more information, see Preemptible TPUs. v3-8 and v2-8: 48 TensorCores
All others: 0
See Request additional quota. Use the --preemptible flag or the --best-effort flag for a queued resource request.

Quota allocation

Cloud TPU quota is granted differently based on the version of TPUs you're using.

TPU v4 and v5p

For TPU v4 and v5p, quota can be specified in terms of TPU chips or TensorCores. You can use your quota in any combination of slices. For example, if you have quota for a v4-32 slice, you can use this quota to create four v4-8 slices.

TPU v5e (training and inference)

v5e supports both training and inference. v5e slices used for inference include TPUs with accelerator types v5litepod-1, v5litepod-4, or v5litepod-8. You need tpu-v5s-litepod-serving quota for on-demand Cloud TPU, tpu-v5s-litepod-serving-preemptible for preemptible Cloud TPU, and tpu-v5s-litepod-serving-reserved for reserved Cloud TPU.

TPU v2 and v3

v2 and v3 TPU quota is specified in terms of TensorCores. A single TPU device contains four TPU chips and eight TensorCores (two TensorCores per chip). v2 and v3 TPUs have separate quotas for single TPU devices and TPU Pods. You cannot use v2 or v3 TPU Pod quota for v2-8 or v3-8 TPUs. For example, if you have quota for a v3-32 slice, you cannot use it to create four v3-8 TPUs.

For more information about TPU chips and TensorCores, see TPU System architecture.

View and request additional quota

You can view the quota allocated for your Google Cloud project on the Quotas page in the Google Cloud console. If you need additional Cloud TPU quota, you can request it from the Quotas page. For more information, see Request a higher quota limit.