A preemptible TPU node is a Cloud TPU node that runs at a much lower price than normal nodes. However, Cloud TPU might exit (preempt) these nodes, at any time, if it requires access to the resources for another purpose.
Creating a preemptible TPU
You can create a preemptible TPU node using the Cloud Console, the
gcloud
command-line tool, or the ctpu
utility.
Console
- Go to the TPUs page under Compute Engine on the main page.
- Click CREATE TPU NODE to open the TPU node creation page.
- At the bottom of the "Create a Cloud TPU" page, click Labels and description to show the preemptibility option.
- Click the preemptibility option to make this new TPU node preemptible.
- Specify the remaining attributes for this TPU node.
- At the bottom of the page click Create to create the TPU node.
gcloud
If you need a custom setup, you may decide to use the gcloud
command
instead of ctpu
to create and manage your TPU resources. For more about
custom setups, see the
Creating and deleting TPUs page.
To create a preemptible TPU, use the same gcloud compute tpus create
command as when creating a normal TPU, but add the --preemptible
flag:
$ gcloud compute tpus create demo-tpu \
--version=2.1 \
--preemptible
where:
demo-tpu
is a name for identifying the TPU that you're creating.--version
specifies the TensorFlow version to use with the TPU.--preemptible
allows Cloud TPU to preempt the TPU.
ctpu
To create a preemptible TPU, use the same ctpu up
command as when creating a normal TPU, but add the --preemptible
flag:
$ ctpu up --preemptible
Note that you must specify the --preemptible
flag each time
you run ctpu up
for the preemptible TPU. The command line flags and their
default values apply to each command invocation individually.
The preemptible status of the TPU is independent of any preemptible status of your VM instance. See the discussion of preemptible VMs and TPUs below.
Pricing and quota for preemptible TPUs
Pricing for preemptible TPUs is significantly lower than for normal TPUs. For details, see the pricing page. You are not charged for TPUs if they are preempted in the first minute after you create them.
Quota for preemptible TPUs is generally higher, and is separate from the quota for normal TPUs. See the quota page.
Preemptible VMs and preemptible TPUs
As described in the quickstart guide, you need a Compute Engine virtual machine (VM) in order to connect to a TPU. Note that the preemptible status of the TPU is independent of the preemptible status of the VM. You can define your TPU as preemptible and the VM as not preemptible, or the other way round. You can also define them both as preemptible.
The most likely combination is a preemptible TPU and a non-preemptible VM. Note the following points:
- The charges for the VM are likely to be low in relation to the charges for the TPU. The VM charges depend on the machine type you use. See the pricing page for a simple example of the relative costs.
- Cloud TPU does not coordinate the preempting of the VM and the TPU. If you define them both as preemptible, the VM and the TPU can be preempted at different times.
- If Compute Engine preempts your VM, you are still charged for the TPU (unless the TPU is itself preempted). Note that the TPU is idle while the VM is preempted.
- Preemptible instances, both Compute Engine VM and Cloud TPU instances, are always preempted after they run for 24 hours. Certain actions reset this 24-hour counter.
You can use the ctpu
command or the gcloud
command to define a preemptible
VM:
ctpu
To create a preemptible VM, use the same ctpu up
command as when creating
a TPU with a normal VM, but add the --preemptible-vm
flag:
$ ctpu up --preemptible-vm
Note that you must specify the --preemptible-vm
flag each time
you run ctpu up
for the preemptible TPU. The command line flags and their
default values apply to each command invocation individually.
gcloud
If you need a custom setup, you may decide to use the gcloud
command
instead of ctpu
to create and manage your TPU resources. For more about
custom setups, see the
Creating and deleting TPUs page.
To create a preemptible VM, use the same gcloud compute instances create
command as when creating a normal VM, but add the --preemptible
flag:
$ gcloud compute instances create tpu-demo-vm \
--machine-type=n1-standard-2 \
--image-project=ml-images \
--image-family=tf-1-15 \
--scopes=cloud-platform \
--preemptible
where:
tpu-demo-vm
is a name for identifying the VM instance that you're creating.--machine-type=n1-standard-2
is a standard machine type with 4 virtual CPUs and 15 GB of memory. See the available machine types.--image-project=ml-images
is a shared collection of images that makes thetf-1-15
image available for your use.--image-family=tf-1-15
is an image with the required pip package for TensorFlow.--scopes=cloud-platform
allows the VM to access Google Cloud APIs.--preemptible
allows Compute Engine to preempt the VM instance.
See the Compute Engine documentation on creating preemptible VM instances.
Detecting if a TPU has been preempted
You can use the ctpu
command or the gcloud
command to check whether the
Cloud TPU service has preempted your TPU:
ctpu
Check the status of your TPUs:
$ ctpu status
The above command prints the details of the TPUs you've created.
The printed value for TPU Preemptible indicates whether the TPU is
preemptible or not.
If the printed TPU State is READY
, the TPU has not been preempted. If
the TPU has been preempted, the state changes from READY
to PREEMPTED
.
gcloud
List your available TPUs:
(vm)$ gcloud compute tpus list
The above command prints the details of the TPUs you've created.
If the printed STATUS is READY
, the TPU has not been preempted. If
the TPU has been preempted, the status changes from READY
to PREEMPTED
.
For example:
NAME ZONE ACCELERATOR_TYPE NETWORK_ENDPOINT NETWORK RANGE STATUS demo-tpu us-central1-b v2-8 10.240.1.2:8470 default 10.240.1.0/29 PREEMPTED
Detecting if a VM instance has been preempted
To check whether the VM instance has been preempted, use the
gcloud compute operations list
command to get a list of recent system
operations. Add a name
filter to only display the instances you currently
have running or add the operationType
filter to only display resources
that have been preempted.
For example, use the following command to display only the instances with
the specified instance name:
$ gcloud compute operations list--filter="name=( 'NAME' my-vm)"
The following example displays only the resources that have been preempted:
$ gcloud compute operations list --filter="operationType=compute.instances.preempted"
For more details, see the Compute Engine guide.
Designing your machine learning application to run on preemptible TPUs
Make sure your application is resilient to restarts of the VM and TPU, by saving model checkpoints regularly and by configuring your application to restore the most recent checkpoint on restart.
The TensorFlow TPUEstimator API takes care of saving and restoring model checkpoints for you. If you use TPUEstimator, you don't need to worry about saving or restoring checkpoints of your TPUs or VMs. Read more about using the TPUEstimator with Cloud TPU.
Best practice is to use the TPUEstimator with Cloud TPU as
described above. However, if you want to investigate how to write the checkpoint
saving and restoration functionality into your model yourself, see the following
resources in the TensorFlow tf.train
module:
What's next
- For a guide to creating your TPU resources, see the quickstart guide.