Manage TPUs
This page describes how to manage Cloud TPU resources using:
- The Google Cloud CLI, which provides the primary CLI to Google Cloud.
- The Google Cloud console, which provides an integrated management console for your Google Cloud resources.
Cloud TPU has two VM architectures, TPU Node and TPU VM. The two VM
architectures are described in System Architecture.
You can use the gcloud
commands described in this document with
both TPU configurations. The gcloud
commands you
use depend on the TPU configuration you are using. Each gcloud
command is
shown in a tabbed section. Choose the tab for the TPU configuration you want to
use and the web page shows the appropriate gcloud
command. Unless you know you
need to use TPU Nodes, we recommend using TPU VMs. For Cloud TPU v4, only the
TPU VM architecture is supported.
Prerequisites
Before you run these procedures, you must install the Google Cloud CLI, create a Google Cloud project, and enable the Cloud TPU API. For instructions, see Set up a project and enable the Cloud TPU API.
If you are using the Google Cloud CLI, you can use the Google Cloud Shell,
a Compute Engine VM, or install the Google Cloud CLI locally. The Google
Cloud Shell allows you to interact with Cloud TPUs without having to install
any software. The Google Cloud Shell may disconnect after a period of
inactivity. If you're running long-running commands, we recommend installing the
Google Cloud CLI on your local machine. For more information on the Google Cloud CLI,
see the gcloud
Reference.
TPUs are not available in all regions. For more information, see Cloud TPU regions and zones.
Creating a Cloud TPU
You can create a Cloud TPU using gcloud
or the Google Cloud console. You can also
make a request to the Cloud TPU API using curl
.
Creating a Cloud TPU using gcloud
The commands you use depend on whether you are using the TPU VM or TPU Node architecture. For more information, see System Architecture.
TPU VMs
You can specify TPU configurations in terms of TensorCores or TPU chips. For more information, see the section for the TPU version you are using in System architecture.
The following command uses a TensorCore-based configuration:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=v4-8 \
--version=tpu-software-version
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
accelerator-type
- The type of the Cloud TPU to create.
version
- The TPU software version.
shielded-secure-boot
(optional)- Specifies that the TPU instances are created with secure boot enabled. This implicitly makes them Shielded VM instances. See What is shielded VM? for more details.
The following command creates a TPU using a chip-based configuration:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--type=v4 \
--topology=2x2x1 \
--version=tpu-software-version
Required flags
tpu-name
- The name of the TPU VM you are creating.
zone
- The zone where you are creating your Cloud TPU.
tpu-type
- For more information on supported TPU types, see TPU types.
topology
- See the topology section for the supported topologies.
version
- The TPU software version you want to use. For more information, see TPU software versions
For more information on supported TPU types and topologies, see Types and topologies.
TPU Nodes
$ gcloud compute tpus execution-groups create --name=tpu-name \
--zone=us-central2-b \
--tf-version=2.12.0 \
--machine-type=n1-standard-1 \
--accelerator-type=v3-8
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
tf-version
- The version of Tensorflow the
gcloud
command installs on your VM. machine-type
- The machine type of the Compute Engine VM to create.
accelerator-type
- The type of the Cloud TPU to create.
Creating a Cloud TPU queued resource using gcloud
Using gcloud
you can also create a Cloud TPU as a queued resource. When you
make a request for a queued resource, your request is added to a queue managed
by Cloud TPU. When a resource becomes available, the resource is allocated and
is available for your exclusive use. For more information see, Cloud TPU queued resources.
Run a startup script
You can run a startup script on each TPU VM by specifying the
--metadata startup-script
parameter when creating the TPU VM. The following
command creates a TPU VM using a startup script.
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=tpu-type \
--version=tpu-vm-tf-2.12.0 \
--metadata startup-script='#! /bin/bash
pip3 install numpy
EOF'
After the TPU VM is created, you can view the logs from the startup script by
connecting to the TPU VM using SSH
and running:
$ cat /var/log/syslog | grep startup-script
Creating a Cloud TPU in the Google Cloud console
- Navigate to the Google Cloud console.
- From the navigation menu, select Compute Engine > TPUs.
- Click CREATE TPU NODE.
- In the Name box, type a TPU instance name.
- In the Zone box, select the zone in which to create the TPU.
- Under TPU settings, select either TPU VM architecture or TPU node architecture. The TPU configuration determines whether you create the TPU as a TPU VM or a TPU Node. For more information, see System Architecture.
- For TPU type, select the TPU type you want to create.
- For TPU software version, select the software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. When creating a Cloud TPU Node, the TPU software version allows you to choose the ML framework installed on the node's VM. No other settings are required. For more information, see Supported Models.
- Click CREATE to create your resources.
Creating a Cloud TPU VM using curl
The following command uses curl
to create a TPU.
$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.10.0-v4', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name
Required fields
runtime_version
- The Cloud TPU runtime version that you want to use.
project
- The name of your enrolled Google Cloud project.
zone
- The zone where you're creating your Cloud TPU.
node_name
- The name of the TPU VM you're creating.
Connecting to a Cloud TPU
You can connect to a TPU using SSH.
TPU VMs
When using TPU VMs, you must explicitly SSH into your TPU using the following:
$ gcloud compute tpus tpu-vm ssh tpu-name \
--zone=zone
To connect to other TPU VMs associated with the TPU Pod, append
--worker <var>worker-number</var>
to the command, where worker-number
is a
0-based index.
TPU Nodes
By default, the gcloud
command you use to create TPU Nodes automatically
attempts to SSH into your TPU node. If you are using TPU Nodes and are not
connected to the Compute Engine instance by the gcloud
command, you can
connect by running the following:
$ gcloud compute ssh tpu-name \
--zone=zone
Listing your Cloud TPU resources
You can list all of your Cloud TPU in a specified zone.
Listing your Cloud TPU resources using gcloud
The commands you use depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm list --zone=zone
TPU Nodes
$ gcloud compute tpus execution-groups list --zone=zone
This command lists the Cloud TPU resources in the specified zone. If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.
Listing your Cloud TPU resources in the Google Cloud console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Retrieving information about your Cloud TPU
You can retrieve information about a specified Cloud TPU.
Retrieve information about a Cloud TPU using gcloud
The commands you use depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm describe tpu-name \
--zone=zone
TPU Nodes
$ gcloud compute tpus execution-groups describe tpu-name \
--zone=zone
Retrieve information about a Cloud TPU in the Google Cloud console
- Navigate to the Google Cloud console.
- From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
- Click the name of your Cloud TPU. The Cloud TPU detail page is displayed.
Stopping your Cloud TPU resources
You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software. Stopping TPU Pods or TPUs allocated through the queued resources API is not supported. To stop incurring charges for TPUs allocated through the queued resources API, you must delete the TPU.
Stopping a Cloud TPU using gcloud
The commands you use for stopping a Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm stop tpu-name \
--zone=zone
TPU Nodes
$ gcloud compute tpus stop tpu-name \
--zone=zone
Stopping a Cloud TPU in the Google Cloud console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Stop.
Starting your Cloud TPU resources
You can start a Cloud TPU when it is stopped.
Starting a Cloud TPU using gcloud
You can start a stopped Cloud TPU to resume using it.
The command you use for starting a stopped Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm start tpu-name --zone=zone
TPU Nodes
$ gcloud compute tpus start tpu-name --zone=zone
Starting a Cloud TPU in the Google Cloud console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Start.
Deleting your Compute Engine VM and Cloud TPU resources
You can delete your Cloud TPU when you are done using it.
Deleting a Cloud TPU using gcloud
The command you use depends on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.
TPU VMs
$ gcloud compute tpus tpu-vm delete tpu-name \
--zone=zone
Command flag descriptions
zone
- The zone where you plan to delete your Cloud TPU.
TPU Nodes
$ gcloud compute tpus execution-groups delete tpu-name \
--zone=zone
Command flag descriptions
zone
- The zone where you plan to delete your Cloud TPU.
Deleting a Cloud TPU in the Google Cloud console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click Delete.
Advanced Configurations
Custom Network Resources
When you create the TPU, you can choose to specify the network and/or a
subnetwork. You can do this either by submitting a gcloud
command or a curl
call.
To specify the network or subnetwork in the gcloud
CLI, use:
--network [NETWORK] --subnetwork [SUBNETWORK]
To specify the network or subnetwork in a curl
call, use:
network_config: {network: '[NETWORK]', subnet: '[SUBNETWORK]', enable_external_ips: true}
Network
You can optionally specify the network to use for the TPU. If not specified,
the default
network is used.
Valid network formats:
https://www.googleapis.com/compute/{version}/projects/{proj-id}/global/networks/{network} compute/{version}/projects/{proj-id}/global/networks/{network} compute/{version}/projects/{proj-##}/global/networks/{network} projects/{proj-id}/global/networks/{network} projects/{proj-##}/global/networks/{network} global/networks/{network} {network}
Subnetwork
You can specify the subnetwork to use a specific subnetwork. The specified subnetwork needs to be in the same region as the zone where the TPU runs.
Valid Formats:
https://www.googleapis.com/compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} compute/{version}/projects/{proj-##}/regions/{region}/subnetworks/{subnetwork} projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} projects/{proj-##}/regions/{region}/subnetworks/{subnetwork} regions/{region}/subnetworks/{subnetwork} {subnetwork}
Private Google Access
In order to SSH into the TPU VMs, you need to either add access configs for the TPU VMs, or turn on the Private Google Access for the subnetwork to which the TPU VMs are connected.
To add access configs, enable_external_ips
must be set. When you
create a TPU,
enable_external_ips
is set by default. If you want to opt out, specify the
following command:
--internal-ips
Or use a curl
call:
network_config: {enable_external_ips: true}
After you have configured Private Google Access, connect to the VM via SSH.
Custom Service Account
Each TPU VM has an associated service account it uses to make API requests on your behalf. TPU VMs use this service account to access files on Cloud Storage and access other services. See Service Accounts to learn more about them.
When you create a TPU node, you can choose to specify a custom service account for the TPU VM identities. By default, Google Compute Engine default service account is used. A custom service account needs to be in the project that you use to create the TPU. Use the following commands to specify a custom service account.
Specify in the gcloud
CLI:
--service-account=[SERVICE_ACCOUNT]
Specify using curl
:
service_account: {email: '[SERVICE_ACCOUNT]'}
To use a custom service account, you need to authorize the service account for your Google Cloud Storage buckets. See Connecting to Cloud Storage Buckets for instructions.
Custom VM SSH methods
Set up a firewall for SSH.
The default network comes preconfigured to allow SSH access to all VMs. If you don't use the default network, or the default network settings were edited, you may need to explicitly enable SSH access by adding a firewall-rule:
$ gcloud CLI compute firewall-rules create \ --network=network allow-ssh \ --allow=tcp:22
SSH into the TPU VMs.
$ gcloud compute tpus tpu-vm ssh tpu-name \ --zone=us-central2-b \ --project=project-id
Required fields
tpu-name
: Name of the TPU node.zone
: The location of the TPU node. Currently, onlyus-central2-b
is supported.project-id
: The project you created above.
For a list of optional fields, see the
gcloud
API documentation.