Cloud TPU pricing

v2 and v3 TPU pricing and quota are divided into two systems:

  • Single device TPU type pricing for individual TPU devices that are available either on-demand or as preemptible devices. You cannot combine multiple single device TPU types to collaborate on a single workload.
  • TPU Pod type pricing for clusters of TPU devices that are connected to each other over dedicated high-speed networks. These TPU types are available only if you have evaluation quota or purchase a 1 year or 3 year commitment.

With Cloud TPU v4, all configurations consist of Pod slices, so there is only one v4 pricing system.

See the TPU System Architecture documentation for architectural details and the differences between v2, v3, and v4.

Charges for Cloud TPU accrue while your TPU node is in a READY state. You receive a bill at the end of each billing cycle that lists usage and charges for that billing cycle.

Cloud TPU v4 pricing

Cloud TPU v4 is the latest generation of Google’s custom silicon for machine learning and is now available in Preview. It retains backwards compatibility with Cloud TPU v2 and v3, but has a >2x increase over Cloud TPU v3 in raw compute performance per chip. Each TPU v4 chip also contains a single logical core, enabling utilization of a full 32 GiB of memory from one program, compared to 8 GiB on v2 and 16 GiB on v3. Cloud TPU v4 Pod slices are connected with a custom interconnect that uses a 3D mesh topology, an upgrade from the 2D mesh in v2 and v3, and are available in configurations ranging from four chips (one TPU VM) to thousands of chips.

TPU v4 Pods are available in us-central2-b, Google's datacenter operating at 90% carbon-free energy on an hourly basis within the same grid.

Use the Cloud TPU v4 sign up form to learn more about Cloud TPU v4 Pods and to get pre-GA access.

The following table shows the pricing in place for Cloud TPU v4 configurations. The v4 pricing is based on the number of chips in the topology. There are 2 cores in each chip.

TPU v4 Pricing Price per chip-hour % discount from on-demand
on-demand / evaluation $3.22
1Y CUD $2.03 37%
3Y CUD $1.45 55%
preemptible $0.97 70%

Cloud TPU v3 and Cloud TPU v4 features and price comparison

Cloud TPU v3 Pod Cloud TPU v4 Pod
Key specifications
Peak compute per chip 123 teraflops (bf16) 275 teraflops (bf16 or int8)
HBM2 capacity and bandwidth 32 GiB, 900 GB/s 32 GiB, 1200 GB/s
Measured min/mean/max power 123/220/262 W 90/170/192 W
TPU Pod size 1024 chips 4096 chips
Interconnect topology 2D torus 3D torus
Peak compute per Pod 126 petaflops (bf16) 1.1 exaflops (bf16 or int8)
All-reduce bandwidth per Pod 340 TB/s 1.1 PB/s
Bisection bandwidth per Pod 6.4 TB/s 24 TB/s
Pricing per chip-hour
Evaluation $2.00 $3.22
1Y CUD (37%) $1.26 $2.03
3Y CUD (55%) $0.90 $1.45
Preemptible $0.60 $0.97

Pricing comparison notes

  • Cloud TPU v4 Pod pricing is shown for us-central2-b location.
  • Cloud TPU v3 Pod pricing is shown for us-east1-d location.
  • Each TPU v3 chip has two cores. The pricing per-chip is shown for comparative purposes.
  • CUD stands for “committed use discount".

How to purchase v4 quota

Contact your sales team or fill in this order form.

Single device pricing

Single device TPU types are billed in one-second increments and are available at either an on-demand or preemptible price.

Single device TPU types are independent TPU devices without direct network connections to other TPU devices in a Google data center. If your workloads require more TPU cores and a larger pool of memory, use a TPU Pod type instead.

A preemptible TPU is one that Cloud TPU can exit (preempt) at any time if Cloud TPU requires access to the resources for another task. The charges for a preemptible TPU are much lower than those for a normal TPU. You are not charged for preemptible TPUs if they are preempted in the first minute after you create them.

You can configure your TPU nodes with the following single device TPU types:

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

TPU Pod type pricing

TPU Pod types provide access to multiple TPU devices that are all connected on a dedicated high-speed network. These TPU types provide greater compute capacity and a larger pool of TPU memory to a single TPU node. To use TPU Pod types, you must request quota using one of the following options:

  • Request access to evaluation quota so that you can test the performance of TPU Pod types. TPU nodes that you create using evaluation quota are billed in one-second increments but do not guarantee the same level of service as on-demand TPU devices or devices that you create using commitment quota. Evaluation quota persists only for a limited amount of time on your project.
  • Purchase a 1 year or 3 year commitment and create TPU nodes with up to 2048 cores. Commitments are not billed incrementally. Commitments allow for access to reserved cores for all hours of the day on an on-going month over month basis for the duration of the contract. Commitments bill you a monthly fee for the duration of your commitment term even if you do not use any TPU resources.

You can configure your TPU nodes with the following TPU types:

If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

To learn about the differences between different TPU versions and configurations, read the TPU System Architecture documentation.

What's next

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.
Contact sales