Single Cloud TPU device and Cloud TPU Pod (beta) configurations
Single Cloud TPU device and Cloud TPU Pod (beta) configurations are described here and in the system architecture document.
- A Cloud TPU v2-8 is a single Cloud TPU device that has 4 TPU chips. Each TPU chip has 2 cores, so the v2-8 has a total of 8 cores.
- A Cloud TPU v2 Pod (beta) consists of 64 TPU devices containing 256 TPU chips (512 cores) connected together with high-speed interfaces.
- A Cloud TPU v3-8 is a single Cloud TPU device that has 4 TPU chips. Each TPU chip has 2 cores, so the v3-8 has a total of 8 cores.
- A Cloud TPU v3 Pod (beta) consists of 256 TPU devices containing 1024 TPU chips (2048 cores) connected together with high-speed interfaces.
Since TPU resources can scale from a single Cloud TPU device to a Cloud TPU Pod (beta), you don't need to choose between a single Cloud TPU device and a Cloud TPU Pod (beta). Instead, you can request portions, or slices, of Cloud TPU Pods, with each slice containing a set of TPU cores. With slices, you can purchase only the processing power you need.
Scaling up from a single Cloud TPU device to a Cloud TPU Pod (beta), often only requires adjusting hyperparameters such as the batch size to match the larger number of TPU cores in the Cloud TPU Pod slice. With just a few minor changes, you can take a model that trains in hours on a single Cloud TPU device and accelerate it to reach the same accuracy in minutes on a full Cloud TPU Pod.
Cloud TPU Pod (beta) advantages
Cloud TPU Pod (beta) brings the following benefits relative to a single Cloud TPU device:
- Increased training speeds for fast iteration in R&D
- Increased human productivity by providing automatically scalable machine learning (ML) compute
- Ability to train much larger models than on a single Cloud TPU device
Cloud TPU Pod (beta) slices
Since not every ML model needs an entire Pod for training, Google Cloud Platform offers smaller sections called slices. Slices are internal allocations consisting of different numbers of TPU chips. Slice sizes are defined in the same way for Cloud TPU v2 Pods and for Cloud TPU v3 Pods; v3 Pods have more TPU chips and therefore support more, and larger slices. For example, a v2 or v3 4x4 slice has 16 TPU chips for a total of 32 cores (16 TPU chips * 2 cores per chip), and a 8x8 slice has 64 TPU chips for a total of 128 cores.
Cloud TPU v2 and v3 Pod slices are available in the following sizes. Several sizes are pictured in the diagrams following the tables.
|Name||# of cores||# of TPU chips|
|v2-32||32 cores||16 TPU chips (4x4 slice)|
|v2-128||128 cores||64 TPU chips (8x8 slice)|
|v2-256||256 cores||128 TPU chips (8x16 slice)|
|v2-512||512 cores||256 TPU chips (16x16 slice)|
|Name||# of cores||# of TPU chips|
|v3-32||32 cores||16 TPU chips (4x4 slice)|
|v3-128||128 cores||64 TPU chips (8x8 slice)|
|v3-256||256 cores||128 TPU chips (8x16 slice)|
|v3-512||512 cores||256 TPU chips (16x16 slice)|
|v3-1024||1024 cores||512 TPU chips (16x32 slice)|
|v3-2048||2048 cores||1024 TPU chips (32x32 slice)|
Requesting Cloud TPU v2 and v3 Pod slices
You can set the type of slice you want to have allocated for your jobs using the names shown in the Pod slices table. The slice names correspond to the supported TPU versions. The smaller the slice, the more likely it is to be available.
You can specify the slice type in the following ways:
- Running the Cloud TPU Provisioning
- When you run
ctpu up, use the
--tpu-sizeparameter to specify the slice name from the table above that corresponds to the number of cores you want to use. For example, to request 32 cores for a v2 Pod (beta), you would specify:
$ ctpu up --tpu-size=[SLICE-NAME]where:
SLICE_NAMEis one of the names from the Pod slices table.
- When you run
When creating a new Cloud TPU resource using the
gcloud compute tpus createcommand, specify the accelerator type to be one of the slice names from the Pod slices table. Note that the slice names specify the number of cores you are requesting. For example, to request 32 cores for a v2 Pod (beta), you would specify:
$ gcloud compute tpus create [TPU name] \ --zone europe-west4-a \ --range '10.240.0.0' \ --accelerator-type 'v2-32' \ --network my-tf-network \ --version '1.13'
TPU nameis a name for identifying the TPU that you're creating.
--zoneis the compute/zone of the Compute Engine. Make sure the requested accelerator type is supported in your region.
--rangespecifies the address of the created Cloud TPU resource and can be any value in
--accelerator-typeis the accelerator version and the number of cores you want to use, for example, v2-32 (32 cores).
--networkspecifies the name of the network that your Compute Engine VM instance uses. You must be able to connect to instances on this network over SSH. For most situations, you can use the default network that your Google Cloud Platform project created automatically. However, an error results if the default network is a legacy network.
--versionspecifies the TensorFlow version to use with the TPU.
- From the left navigation menu, select Compute Engine > TPUs.
- On the TPUs screen click Create TPU node. This brings up a configuration page for your TPU.
- Under TPU type select the slice name associated with the number of cores you want to use, for example, v2-32 (32 cores).
- Click the Create button.