Preemptible TPUs

Preemptible TPU nodes cost much less than non-preemtible TPU nodes. The Cloud TPU service might preempt (shut down) these nodes at any time, if it requires additional TPU resources.

You can create a preemptible TPU node using the Cloud Console or the gcloud command-line tool. The gcloud command you use depends on whether you are using TPU VMs or legacy TPU nodes. For more information, see System Architecture.

Creating a preemptible TPU VM node

Console

  1. Go to the TPUs page under Compute Engine on the main page.
  2. Click CREATE TPU NODE to open the TPU node creation page.
  3. Type a name for your TPU node.
  4. Select the zone in which to create the TPU node.
  5. Select a TPU type for your TPU node.
  6. Select Turn on preemtibility for this node to make your TPU node preemptible.
  7. Select the TPU software version. For example v2-alpha for TPU VMs.

gcloud

$ gcloud alpha compute tpus tpu-vm create demo-tpu \
  --zone=europe-west4-a \
  --accelerator-type=v3-8
  --version=v2-alpha \
  --preemptible
  

where:

  • demo-tpu is a name for the TPU.
  • `--accelerator-type specifies the type of TPU.
  • --version specifies the version of TPU VM software to install.
  • --preemptible allows Cloud TPU to preempt the TPU.

Creating a preemptible legacy TPU node

Console

  1. Go to the TPUs page under Compute Engine on the main page.
  2. Click CREATE TPU NODE to open the TPU node creation page.
  3. Type a name for your TPU node.
  4. Select the zone in which to create the TPU node.
  5. Select a TPU type for your TPU node.
  6. Click Turn on preemtibility for this node to make your TPU node preemptible.
  7. Select the Tensorflow or PyTorch version to install on your VM.

gcloud

$ gcloud compute tpus execution-groups create demo-tpu \
  --zone=europe-west4-a \
  --accelerator-type=v3-8
  --version=2.5.0 \
  --preemptible
  

where:

  • demo-tpu is a name for the TPU.
  • `--accelerator-type specifies the type of TPU.
  • --version specifies the version of Tensorflow or PyTorch to install on your VM.
  • --preemptible allows Cloud TPU to preempt the TPU.

With legacy TPU nodes, the preemptible status of a TPU is independent of the preemptible status of your VM instance.

Pricing and quota for preemptible TPUs

Pricing for preemptible TPUs is significantly lower than for normal TPUs. For details, see the pricing page. You are not charged for TPUs if they are preempted in the first minute after you create them.

Quota for preemptible TPUs is generally higher, and is separate from the quota for normal TPUs. See the quota page.

Preemptible VMs and TPUs (legacy TPU nodes only)

As described in the quickstart guide, you need a Compute Engine virtual machine (VM) in order to connect to a TPU. Note that the preemptible status of the TPU is independent of the preemptible status of the VM. You can define your TPU as preemptible and the VM as not preemptible, or the other way round. You can also define them both as preemptible.

The most likely combination is a preemptible TPU and a non-preemptible VM. Note the following points:

  • The charges for the VM are likely to be low in relation to the charges for the TPU. The VM charges depend on the machine type you use. See the pricing page for a simple example of the relative costs.
  • Cloud TPU does not coordinate the preempting of the VM and the TPU. If you define them both as preemptible, the VM and the TPU can be preempted at different times.
  • If Compute Engine preempts your VM, you are still charged for the TPU (unless the TPU is itself preempted). Note that the TPU is idle while the VM is preempted.
  • Preemptible instances, both Compute Engine VM and Cloud TPU instances, are always preempted after they run for 24 hours. Certain actions reset this 24-hour counter.

Detecting if a TPU has been preempted

You use the gcloud command to check whether the Cloud TPU service has preempted your TPU:

List your available TPUs:

(vm)$ gcloud compute tpus list

The above command displays the details of the TPUs created in your project. If a TPU has been preempted, the status changes from READY to PREEMPTED.

For example:

NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  PREEMPTED

Detecting if a VM instance has been preempted (legacy TPU nodes only)

To check whether the VM instance has been preempted, use the gcloud compute operations list command to get a list of recent system operations. Add a name filter to only display the instances you currently have running or add the operationType filter to only display resources that have been preempted. For example, use the following command to display only the instances with the specified instance name:

$ gcloud compute operations list--filter="name=( 'NAME' my-vm)"

The following example displays only the resources that have been preempted:

$ gcloud compute operations list --filter="operationType=compute.instances.preempted"

For more details, see the Compute Engine guide.

Designing your machine learning application to run on preemptible TPUs

Make sure your application is resilient to restarts of the VM and TPU, by saving model checkpoints regularly and by configuring your application to restore the most recent checkpoint on restart.