Cloud TPU quotas

This document lists the quotas that apply to Cloud TPU. For information about Cloud TPU pricing, see Cloud TPU pricing.

Google Cloud uses quotas to help ensure fairness and reduce spikes in resource use and availability. A quota restricts how much of a Google Cloud resource your Google Cloud project can use. Quotas apply to a range of resource types, including hardware, software, and network components. For example, quotas can restrict the number of API calls to a service, the number of load balancers used concurrently by your project, or the number of projects that you can create. Quotas protect the community of Google Cloud users by preventing the overloading of services. Quotas also help you to manage your own Google Cloud resources.

The Cloud Quotas system does the following:

  • Monitors your consumption of Google Cloud products and services
  • Restricts your consumption of those resources
  • Provides a means to request changes to the quota value

In most cases, when you attempt to consume more of a resource than its quota allows, the system blocks access to the resource, and the task that you're trying to perform fails.

Quotas generally apply at the Google Cloud project level. Your use of a resource in one project doesn't affect your available quota in another project. Within a Google Cloud project, quotas are shared across all applications and IP addresses.

TPU quota

There are different quotas for each version of TPU. For example there are different quotas for TPU v2, v3, and so on. For each version of TPU there are different types of quota: on-demand and preemptible (spot). The following table describes the different types of quota.

Quota type Description Default value How to request Flags for TPU creation
On-demand The number of on-demand resources for which you have access. On-demand resources won't be preempted, but on-demand quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request. v3-8 and v2-8: 16 TensorCores
All others: 0
See Request additional quota. No flags needed, selected by default.
Preemptible The number of preemptible Cloud TPU resources for which you have access. This quota applies to both preemptible TPUs and TPU Spot VMs. Preemptible resources may be preempted to make room for higher priority jobs. Preemptible quota does not guarantee there will be enough available Cloud TPU resources to satisfy your request. For more information, see Preemptible TPUs and Manage TPU Spot VMs. v3-8 and v2-8: 48 TensorCores
All others: 0
See Request additional quota.

TPU quotas are specified in terms of TPU cores per project per zone or TPU cores per project per region.

TPU v5p quotas

You can use your TPU v5p quota in any combination of cores. For example, if you have quota for 32 cores, you can use this quota to create four TPU slices each with 8 cores.

Preemptible quotas:

  • Preemptible TPU v5p cores per project per region
  • Preemptible TPU v5p cores per project per zone

On-demand quotas:

  • TPU v5p cores per project per region
  • TPU v5p cores per project per zone

TPU v5e quotas

TPU v5e can be used for training and serving. There are separate quotas for training and serving as well as single-host (lite cores) and multi-host (lite pod cores).

Serving quotas

Preemptible serving quotas:

  • Preemptible TPU v5 lite pod cores for serving per project per region
  • Preemptible TPU v5 lite pod cores for serving per project per zone

On-demand serving quotas:

  • TPU v5 lite pod cores for serving per project per region
  • TPU v5 lite pod cores for serving per project per zone

Training quotas

Preemptible training quotas:

  • Preemptible TPU v5 lite cores per project per region
  • Preemptible TPU v5 lite cores per project per zone
  • Preemptible TPU v5 lite pod cores per project per region
  • Preemptible TPU v5 lite pod cores per project per zone

On-demand training quotas:

  • TPU v5 lite cores per project per region
  • TPU v5 lite cores per project per zone
  • TPU v5 lite pod cores per project per region
  • TPU v5 lite pod cores per project per zone

TPU v4 quotas

You can use your TPU v4 quota in any combination of cores. For example, if you have quota for 32 cores, you can use this quota to create four TPU slices each with 8 cores.

Preemptible quotas:

  • Preemptible TPU v4 pod cores per project per region
  • Preemptible TPU v4 pod cores per project per zone

On demand quotas:

  • TPU v4 pod cores per project per region
  • TPU v4 pod cores per project per zone

TPU v3 quotas

There are separate TPU v3 quotas for single host TPUs (core) and mulithost TPUs (pod). You must use v3 pod quotas to create TPUs with more than 8 cores.

Preemptible quotas:

  • Preemptible TPU v3 cores per project per region
  • Preemptible TPU v3 cores per project per zone
  • Preemptible TPU v3 pod cores per project per region
  • Preemptible TPU v3 pod cores per project per zone

On demand quotas:

  • TPU v3 cores per project per region
  • TPU v3 cores per project per zone
  • TPU v3 pod cores per project per region
  • TPU v3 pod cores per project per zone

TPU v2 quotas

There are separate TPU v2 quotas for single-host TPUs (core) and multi-host TPUs (pod).

Preemptible quotas:

  • Preemptible TPU v2 cores per project per region
  • Preemptible TPU v2 cores per project per zone
  • Preemptible TPU v2 pod cores per project per region
  • Preemptible TPU v2 pod cores per project per zone

On demand quotas:

  • TPU v2 cores per project per region
  • TPU v2 cores per project per zone
  • TPU v2 pod cores per project per region
  • TPU v2 pod cores per project per zone

For more information about TPU chips and TensorCores, see TPU System architecture.

View and request additional quota

You can view the quota allocated for your Google Cloud project on the Quotas page in the Google Cloud console. If you need additional Cloud TPU quota, you can request it from the Quotas page. For more information, see Request a higher quota limit.