TPU types

Overview

When you create TPU nodes to handle your machine learning workloads, you must select a TPU type. The TPU type defines the TPU version, the number of TPU cores, and the amount of TPU memory that is available for your machine learning workload.

For example, the v2-8 TPU type defines a TPU node with 8 TPU v2 cores and 64 GiB of total TPU memory. The v3-2048 TPU type defines a TPU node with 2048 TPU v3 cores and 32 TiB of total TPU memory.

To learn about the hardware differences between TPU versions and configurations, read the System Architecture documentation.

To see pricing for each TPU type in each region, see the Pricing page.

A model that runs on one TPU type can run with no code changes on another TPU type. Each TPU type runs the same TensorFlow software for training and evaluating models. However, when you scale up to larger TPU types you must tune your model significantly to accommodate the change in the hardware structure. Specifically, the transition from a single-device v2-8 or v3-8 TPU type to a larger TPU type requires significant tuning and optimization.

TPU types

The main differences between each TPU type are price, performance, and memory capacity.

Google Cloud Platform uses regions, subdivided into zones, to define the geographic location of physical computing resources. When you create a Cloud TPU, you specify the zone in which you want to create it.

You can configure your TPU nodes with the following TPU types:

US

TPU type (v2) TPU v2 cores Total TPU memory Available zones
v2-8 8 64 GiB us-central1-a
us-central1-b
us-central1-c
(us-central1-f TFRC only)
v2-32 (Beta) 32 256 GiB us-central1-a
v2-128 (Beta) 128 1 TiB us-central1-a
v2-256 (Beta) 256 2 TiB us-central1-a
v2-512 (Beta) 512 4 TiB us-central1-a
TPU type (v3) TPU v3 cores Total TPU memory Available regions
v3-8 8 128 GiB us-central1-a
us-central1-b
(us-central1-f TFRC only)

Europe

TPU type (v2) TPU v2 cores Total TPU memory Available zones
v2-8 8 64 GiB europe-west4-a
v2-32 (Beta) 32 256 GiB europe-west4-a
v2-128 (Beta) 128 1 TiB europe-west4-a
v2-256 (Beta) 256 2 TiB europe-west4-a
v2-512 (Beta) 512 4 TiB europe-west4-a
TPU type (v3) TPU v3 cores Total TPU memory Available regions
v3-8 8 128 GiB europe-west4-a
v3-32 (Beta) 32 512 GiB europe-west4-a
v3-64 (Beta) 64 1 TiB europe-west4-a
v3-128 (Beta) 128 2 TiB europe-west4-a
v3-256 (Beta) 256 4 TiB europe-west4-a
v3-512 (Beta) 512 8 TiB europe-west4-a
v3-1024 (Beta) 1024 16 TiB europe-west4-a
v3-2048 (Beta) 2048 32 TiB europe-west4-a

Asia Pacific

TPU type (v2) TPU v2 cores Total TPU memory Available zones
v2-8 8 64 GiB asia-east1-c

TPU types with higher numbers of cores are available only in limited quantities. TPU types with lower core counts are more likely to be available.

To see pricing for each TPU type in each region, see the Pricing page.

To learn about the hardware differences between TPU versions and configurations, read the System Architecture documentation.

Calculating price and performance tradeoffs

To decide which TPU type you want to use, you can do experiments using a Cloud TPU tutorial to train a model that is similar to your application.

Run the tutorial for 5 - 10% of the number of steps you will use to run the full training on a v2-8 and a v3-8 TPU type. The result tells you how long it takes to run that number of steps for that model on each TPU type.

Because performance on TPU types scales linearly, if you know how long it takes to run a task on a v2-8 or v3-8 TPU type, you can estimate how much you can reduce task time by running your model on a larger TPU type with more cores.

For example, if a v2-8 TPU type takes 60 minutes to 10,000 steps, a v2-32 node should take approximately 15 minutes to perform the same task.

To determine the difference in cost within your region between the different TPU types for Cloud TPU and the associated Compute Engine VM, see the TPU pricing page. When you know the approximate training time for your model on a few different TPU types, you can weigh the VM/TPU cost against training time to help you decide your best price/performance tradeoff.

Specifying the TPU type

You specify a TPU type when you create a TPU nodes. For example, you can select a TPU type using one of the following methods:

ctpu utility

gcloud command

  • Use the gcloud compute tpus create command:

    $ gcloud compute tpus create [TPU name] \
     --zone us-central1-b \
     --range '10.240.0.0' \
     --accelerator-type 'v2-8' \
     --network my-tf-network \
     --version '1.14'
    

    where:

    • TPU name is a name for identifying the TPU that you're creating.
    • --zone is the compute/zone of the Compute Engine. Make sure the requested accelerator type is supported in your region.
    • --range specifies the address of the created Cloud TPU resource and can be any value in 10.240.*.*.
    • --accelerator-type is the type of accelerator and number of cores you want to use, for example, v2-32 (32 cores).
    • --network specifies the name of the network that your Compute Engine VM instance uses. You must be able to connect to instances on this network over SSH. For most situations, you can use the default network that your Google Cloud Platform project created automatically. However, an error results if the default network is a legacy network.
    • --version specifies the TensorFlow version to use with the TPU.

Cloud Console

  1. From the left navigation menu, select Compute Engine > TPUs.
  2. On the TPUs screen click Create TPU node. This brings up a configuration page for your TPU.
  3. Under TPU type select one of the supported TPU versions.
  4. Click the Create button.

What's next

Esta página foi útil? Conte sua opinião sobre:

Enviar comentários sobre…