Cloud TPU quotas

This document lists the quotas that apply to Cloud TPU. For information about Cloud TPU pricing, see Cloud TPU pricing.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

Monitors your consumption of Google Cloud products and services
Restricts your consumption of those resources
Provides a way to request changes to the quota value

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

TPU quota

There are different quotas for each version of TPU. For example there are different quotas for TPU v2, v3, and so on. For each version of TPU there are different types of quota: on-demand and preemptible (Spot VMs). The following table describes the different types of quota.

Quota type	Description	Default value	How to request	Flags for TPU creation
On-demand	The number of on-demand resources for which you have access. On-demand resources won't be preempted, but on-demand quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request.	v3-8 and v2-8: 16 TensorCores All others: 0	See Request additional quota.	No flags needed, selected by default.
Preemptible	The number of preemptible Cloud TPU resources for which you have access. This quota applies to both preemptible TPUs and TPU Spot VMs. Preemptible resources may be preempted to make room for higher priority jobs. Preemptible quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request. For more information, see Preemptible TPUs and Manage TPU Spot VMs.	v3-8 and v2-8: 48 TensorCores All others: 0	See Request additional quota.	Spot VMs: Use the `--spot` flag. Preemptible TPUs: Use the `--preemptible` flag. The `--preemptible` flag is not supported for queued resources. Use the `--spot` flag instead.

TPU quotas are specified in terms of TPU cores per project per zone or TPU cores per project per region.

TPU v5p quotas

You can use your TPU v5p quota in any combination of cores. For example, if you have quota for 32 cores, you can use this quota to create four TPU slices each with 8 cores.

Preemptible quotas:

Preemptible TPU v5p cores per project per region
Preemptible TPU v5p cores per project per zone

On-demand quotas:

TPU v5p cores per project per region
TPU v5p cores per project per zone

TPU v5e quotas

TPU v5e can be used for training and serving. There are separate quotas for training and serving as well as single-host (lite cores) and multi-host (lite pod cores).

Serving quotas

Preemptible serving quotas:

Preemptible TPU v5 lite pod cores for serving per project per region
Preemptible TPU v5 lite pod cores for serving per project per zone

On-demand serving quotas:

TPU v5 lite pod cores for serving per project per region
TPU v5 lite pod cores for serving per project per zone

Training quotas

Preemptible training quotas:

Preemptible TPU v5 lite cores per project per region
Preemptible TPU v5 lite cores per project per zone
Preemptible TPU v5 lite pod cores per project per region
Preemptible TPU v5 lite pod cores per project per zone

On-demand training quotas:

TPU v5 lite cores per project per region
TPU v5 lite cores per project per zone
TPU v5 lite pod cores per project per region
TPU v5 lite pod cores per project per zone

TPU v4 quotas

You can use your TPU v4 quota in any combination of cores. For example, if you have quota for 32 cores, you can use this quota to create four TPU slices each with 8 cores.

Preemptible quotas:

Preemptible TPU v4 pod cores per project per region
Preemptible TPU v4 pod cores per project per zone

On demand quotas:

TPU v4 pod cores per project per region
TPU v4 pod cores per project per zone

TPU v3 quotas

There are separate TPU v3 quotas for single host TPUs (core) and mulithost TPUs (pod). You must use v3 pod quotas to create TPUs with more than 8 cores.

Preemptible quotas:

Preemptible TPU v3 cores per project per region
Preemptible TPU v3 cores per project per zone
Preemptible TPU v3 pod cores per project per region
Preemptible TPU v3 pod cores per project per zone

On demand quotas:

TPU v3 cores per project per region
TPU v3 cores per project per zone
TPU v3 pod cores per project per region
TPU v3 pod cores per project per zone

TPU v2 quotas

There are separate TPU v2 quotas for single-host TPUs (core) and multi-host TPUs (pod).

Preemptible quotas:

Preemptible TPU v2 cores per project per region
Preemptible TPU v2 cores per project per zone
Preemptible TPU v2 pod cores per project per region
Preemptible TPU v2 pod cores per project per zone

On demand quotas:

TPU v2 cores per project per region
TPU v2 cores per project per zone
TPU v2 pod cores per project per region
TPU v2 pod cores per project per zone

For more information about TPU chips and TensorCores, see TPU System architecture.

View and request additional quota

You can view the quota allocated for your Google Cloud project on the Quotas page in the Google Cloud console. If you need additional Cloud TPU quota, you can request it from the Quotas page. For more information, see Request a higher quota limit.

When a Google Cloud service increases the default quota values for resources and APIs, these changes take place gradually. This might result in ongoing rollouts across different regions or resources. During the rollout, the quota value that appears in the Google Cloud console or Cloud Quotas API won't reflect the new, increased quota value until the rollout completes. For more information, see View ongoing rollouts.