Preemptible TPUs

Preemptible TPUs cost much less than non-preemptible TPUs. The Cloud TPU service might preempt (shut down) these TPUs at any time, if it requires additional TPU resources.

If you are creating a preemptible TPU VM, use the gcloud command. If you are creating a preemptible TPU Node, you can use the gcloud command or the Console. For information on the differences between TPU VMs and TPU Nodes, see System Architecture.

Creating a preemptible TPU VM

gcloud

$ gcloud compute tpus tpu-vm create demo-tpu \
  --zone=europe-west4-a \
  --accelerator-type=v3-8 \
  --version=tpu-vm-tf-2.16.1-pjrt \
  --preemptible
  

where:

  • demo-tpu is a name for the TPU.
  • --accelerator-type specifies the type of TPU.
  • --version specifies the version of TPU VM software to install.
  • --preemptible allows Cloud TPU to preempt the TPU.

Creating a preemptible TPU Node

Console

  1. Go to the TPUs page under Compute Engine on the main page.
  2. Click CREATE TPU NODE to open the TPU node creation page.
  3. Type a name for your TPU node.
  4. Select the zone in which to create the TPU node.
  5. Select a TPU type for your TPU node.
  6. Click Turn on preemtibility for this node to make your TPU node preemptible.
  7. Select the TensorFlow or PyTorch version to install on your VM.

gcloud

$ gcloud compute tpus execution-groups create \
  --name=demo-tpu \
  --zone=europe-west4-a \
  --accelerator-type=v3-8 \
  --tf-version=2.12.0 \
  --preemptible
  

where:

  • demo-tpu is a name for the TPU.
  • `--accelerator-type specifies the type of TPU.
  • --tf-version specifies the version of Tensorflow or PyTorch to install on your VM.
  • --preemptible allows Cloud TPU to preempt the TPU.

With TPU Nodes, the preemptible status of a TPU is independent of the preemptible status of your VM instance.

Pricing and quota for preemptible TPUs

Pricing for preemptible TPUs is significantly lower than for normal TPUs. For details, see the pricing page. You are not charged for TPUs if they are preempted in the first minute after you create them.

Quota for preemptible TPUs is generally higher, and is separate from the quota for normal TPUs. See the quota page.

Detecting if a TPU has been preempted

You use the following gcloud command to check whether the Cloud TPU service has preempted your TPU:

List your available TPUs:

TPU VM

gcloud compute tpus tpu-vm list --zone=us-central1-b

TPU Node

(vm)$ gcloud compute tpus list --zone=us-central1-b

The output of the command displays the details of the TPUs created in your project. If a TPU has been preempted, the status changes from READY to PREEMPTED.

For example:

NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  PREEMPTED

Preemptible VMs and TPUs (TPU Nodes only)

As described in the quickstart guide for your framework, you need a Compute Engine virtual machine (VM) in order to connect to a TPU. Note that the preemptible status of the TPU is independent of the preemptible status of the VM. You can define your TPU as preemptible and the VM as not preemptible, or the other way round. You can also define them both as preemptible.

The most likely combination is a preemptible TPU and a non-preemptible VM. Note the following points:

  • The charges for the VM are likely to be low in relation to the charges for the TPU. The VM charges depend on the machine type you use. See the pricing page for a simple example of the relative costs.
  • Cloud TPU does not coordinate the preempting of the VM and the TPU. If you define them both as preemptible, the VM and the TPU can be preempted at different times.
  • If Compute Engine preempts your VM, you are still charged for the TPU (unless the TPU is itself preempted). Note that the TPU is idle while the VM is preempted.
  • Preemptible instances, both Compute Engine VM and Cloud TPU instances, are always preempted after they run for 24 hours. Certain actions reset this 24-hour counter.

Detecting if a VM instance has been preempted (TPU Nodes only)

To check whether the VM instance has been preempted, use the gcloud compute operations list command to get a list of recent system operations. Add a name filter to only display the instances you have running or add the operationType filter to only display resources that have been preempted. For example, use the following command to display only the instances with the specified instance name:

$ gcloud compute operations list--filter="name=( 'NAME' my-vm)"

The following example displays only the resources that have been preempted:

$ gcloud compute operations list --filter="operationType=compute.instances.preempted"

For more details, see the Compute Engine guide.

Designing your machine learning application to run on preemptible TPUs

Make sure your application is resilient to restarts of the VM and TPU, by saving model checkpoints regularly and by configuring your application to restore the most recent checkpoint on restart.