Managing TPUs
You can use the gcloud
commands described in this document with
both TPU configurations: TPU VMs and TPU Nodes. The gcloud
commands you
use depend on the TPU configuration you are using. Each gcloud
command is
shown in a tabbed section. Choose the tab for the TPU configuration you want to
use and the web page shows the appropriate gcloud
command. Unless you know you
need to use TPU Nodes, we recommend using TPU VMs. For more information
about TPU configurations, see System Architecture.
Running a Machine Learning (ML) model requires a Compute Engine VM and Cloud TPU resources. This page describes how to manage these resources using:
- The Google Cloud CLI, which provides the primary CLI to Google Cloud Platform (GCP)
- The Google Cloud console, which provides an integrated management console for your GCP resources.
Prerequisites
To run these procedures, you need to have a Google Cloud Platform (GCP) project set up. If you don't have a project, see Creating and managing projects to set one up.
If you are using the gcloud
command you can use the Google Cloud Shell or install
the gcloud
command locally. The Google Cloud Shell allows you to interact with Cloud
TPUs without having to install any software. The Google Cloud Shell may
disconnect after a period of inactivity, so if you are running long-running
commands, we recommend installing gcloud
on your local machine. The gcloud
command is part of the Google Cloud CLI.
- Install the Google Cloud CLI.
Configure
gcloud
to use your project.gcloud config set project project-name
Configure
gcloud
to use the zone where you plan to create your Cloud TPU resources. For example,us-central1-b
.$ gcloud config set compute/zone zone
For more information on the gcloud
command, see the gcloud
Reference.
Creating a Cloud TPU
When you create a Cloud TPU, you create Compute Engine VM and TPU resources.
Creating a Cloud TPU with gcloud
If you want to use the Cloud Shell, click Open Cloud Shell. Otherwise, open a command prompt/terminal window on your local computer.
Create your Cloud TPU resources. The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=zone \
--accelerator-type=v3-8 \
--version=tpu-vm-tf-2.8.0
TPU Nodes
$ gcloud compute tpus execution-groups create --name=tpu-name \
--zone=zone \
--tf-version=2.8.0 \
--machine-type=n1-standard-1 \
--accelerator-type=v3-8
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
tf-version
- The version of Tensorflow the
gcloud
command installs on your VM. machine-type
- The machine type of the Compute Engine VM to create.
accelerator-type
- The type of the Cloud TPU to create.
Run standard installation scripts
You can run a startup script on each TPU VM by specifying the
--metadata startup-script
parameter when creating the TPU VM. The following
is an example of using a startup-script
for TPU VM.
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=zone \
--accelerator-type=tpu-type \
--version=tpu-vm-tf-2.8.0 \
--metadata startup-script='#! /bin/bash
pip3 install numpy
EOF'
After the TPU VM is created, you can view the logs from the
startup script by connecting to the TPU VM using SSH
and running:
$ cat /var/log/syslog | grep startup-script
Creating a Cloud TPU in the Google Cloud console
- Navigate to the Google Cloud console.
- From the navigation menu, select Compute Engine > TPUs.
- Click CREATE TPU NODE.
- In the Name box, type a TPU instance name.
- In the Zone box, select the zone in which to create the TPU.
- Under TPU Configuration, select either TPU VM or TPU Node. The TPU configuration determines whether you create the TPU as a TPU VM or a TPU Node. For more information, see System Architecture.
- For TPU type, select the TPU type you want to create.
- For TPU software version, select the software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. When creating a Cloud TPU Node, the TPU software version allows you to choose the ML framework installed on the node's VM. No other settings are required. For more information, see Supported Models.
- Click CREATE to create your resources.
Connecting to a Cloud TPU VM
By default, the gcloud
command you use to create TPU Nodes automatically
attempts to SSH into your TPU node. If you are using TPU Nodes and are not
connected to the Compute Engine instance by the gcloud
command, you can
connect by running the following TPU Nodes command. When using TPU VMs
you must explicitly
SSH into your TPU using the following TPU VM command.
TPU VMs
$ gcloud compute tpus tpu-vm ssh tpu-name \
--zone=zone
TPU Nodes
$ gcloud compute ssh tpu-name\
--zone=zone
Listing your Cloud TPU resources
You can list all of your Cloud TPU in a specified zone.
Listing your Cloud TPU resources using gcloud
The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm list --zone=zone
TPU Nodes
$ gcloud compute tpus execution-groups list --zone=zone
This command lists the Cloud TPU resources in the specified zone. If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.
Listing your Cloud TPU resources using the GCP Console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Retrieving information about your Cloud TPU
You can retrieve information about a specified Cloud TPU.
Retrieve information about a Cloud TPU using gcloud
The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm describe tpu-name \
--zone=zone
TPU Nodes
$ gcloud compute tpus execution-groups describe tpu-name \
--zone=zone
Retrieve information about a Cloud TPU using the Google Cloud console
- Navigate to the Google Cloud console.
- From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
- Click the name of your Cloud TPU, the Cloud TPU detail page is displayed.
Stopping your Cloud TPU resources
You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software. Stopping TPU Pods is not supported.
Stopping a Cloud TPU with gcloud
The command you use for stopping a Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm stop tpu-name \
--zone=zone
TPU Nodes
$ gcloud compute tpus stop tpu-name \
--zone=zone
Stopping a Cloud TPU in the GCP console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Stop.
Starting your Cloud TPU resources
You can start a Cloud TPU when it is stopped.
Starting a Cloud TPU with gcloud
The command you use for stopping a Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm start tpu-name --zone=zone
TPU Nodes
$ gcloud compute tpus start tpu-name --zone=zone
Starting a Cloud TPU in the GCP console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Start.
Deleting your Compute Engine VM and Cloud TPU resources
You can delete your Cloud TPU when you are done using them.
Deleting a Cloud TPU using gcloud
The command you use depends on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm delete tpu-name \
--zone=zone
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
TPU Nodes
$ gcloud compute tpus execution-groups delete tpu-name \
--zone=zone
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
Deleting a Cloud TPU using the GCP Console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Delete.