Preemptible TPU nodes cost much less than non-preemptible TPU nodes. The
Cloud TPU service might preempt (shut down) these nodes at any
time, if it requires additional TPU resources to service other requests. You can
create a preemptible TPU node using the Cloud Console or the
Creating a preemptible TPU node
- Go to the TPUs page under Compute Engine on the main page.
- Click CREATE TPU NODE to open the TPU node creation page.
- Type a name for your TPU node.
- Select the zone in which to create the TPU node.
- Select a TPU type for your TPU node.
- Click Turn on preemtibility for this node to make your TPU node preemptible.
- Select the Tensorflow or PyTorch version to install on your VM.
$ gcloud compute tpus execution-groups create demo-tpu \ --zone=europe-west4-a \ --accelerator-type=v3-8 --version=2.5.0 \ --preemptible
demo-tpuis a name for the TPU.
- `--accelerator-type specifies the type of TPU.
--versionspecifies the version of Tensorflow or PyTorch to install on your VM.
--preemptibleallows Cloud TPU to preempt the TPU.
The preemptible status of a TPU is independent of the preemptible status of your VM instance.
Pricing and quota for preemptible TPUs
Pricing for preemptible TPUs is significantly lower than for normal TPUs. For details, see the pricing page. You are not charged for TPUs if they are preempted in the first minute after you create them.
Quota for preemptible TPUs is generally higher, and is separate from the quota for normal TPUs. See the quota page.
Preemptible VMs and TPUs
As described in the quickstart guide, you need a Compute Engine virtual machine (VM) in order to connect to a TPU. Note that the preemptible status of the TPU is independent of the preemptible status of the VM. You can define your TPU as preemptible and the VM as not preemptible, or the other way round. You can also define them both as preemptible.
The most likely combination is a preemptible TPU and a non-preemptible VM. Note the following points:
- The charges for the VM are likely to be low in relation to the charges for the TPU. The VM charges depend on the machine type you use. See the pricing page for a simple example of the relative costs.
- Cloud TPU does not coordinate the preempting of the VM and the TPU. If you define them both as preemptible, the VM and the TPU can be preempted at different times.
- If Compute Engine preempts your VM, you are still charged for the TPU (unless the TPU is itself preempted). Note that the TPU is idle while the VM is preempted.
- Preemptible instances, both Compute Engine VM and Cloud TPU instances, are always preempted after they run for 24 hours. Certain actions reset this 24-hour counter.
Detecting if a TPU has been preempted
You use the
gcloud command to check whether the Cloud TPU service has preempted your TPU:
List your available TPUs:
(vm)$ gcloud compute tpus list
The above command displays the details of the TPUs created in your project. If a
TPU has been preempted, the status changes from
NAME ZONE ACCELERATOR_TYPE NETWORK_ENDPOINT NETWORK RANGE STATUS demo-tpu us-central1-b v2-8 10.240.1.2:8470 default 10.240.1.0/29 PREEMPTED
Detecting if a VM instance has been preempted
To check whether the VM instance has been preempted, use the
gcloud compute operations list command to get a list of recent system
operations. Add a
name filter to only display the instances you currently
have running or add the
operationType filter to only display resources
that have been preempted.
For example, use the following command to display only the instances with
the specified instance name:
$ gcloud compute operations list--filter="name=( 'NAME' my-vm)"
The following example displays only the resources that have been preempted:
$ gcloud compute operations list --filter="operationType=compute.instances.preempted"
For more details, see the Compute Engine guide.
Designing your machine learning application to run on preemptible TPUs
Make sure your application is resilient to restarts of the VM and TPU, by saving model checkpoints regularly and by configuring your application to restore the most recent checkpoint on restart.