Manage TPU resources
This page describes how to manage Cloud TPU resources using:
- The Google Cloud CLI, which provides the primary CLI to Google Cloud.
- The Google Cloud console, which provides an integrated management console for your Google Cloud resources.
Prerequisites
Before you run these procedures, you must install the Google Cloud CLI, create a Google Cloud project, and enable the Cloud TPU API. For instructions, see Set up a project and enable the Cloud TPU API.
If you are using the Google Cloud CLI, you can use the Google Cloud Shell,
a Compute Engine VM, or install the Google Cloud CLI locally. The Google
Cloud Shell lets you interact with Cloud TPUs without having to install
any software. The Google Cloud Shell may disconnect after a period of
inactivity. If you're running long-running commands, we recommend installing the
Google Cloud CLI on your local machine. For more information on the Google Cloud CLI,
see the gcloud
Reference.
Provision Cloud TPUs
You can provision a Cloud TPU using gcloud
, the Google Cloud console,
or using the Cloud TPU API.
Using gcloud
, there are two methods for provisioning TPUs:
- Using queued resources:
gcloud alpha compute tpus queued-resources create
- Using the Create Node API:
gcloud compute tpus tpu-vm create
The best practice is to provision TPUs using queued resources. When you request queued resources, the request is added to a queue maintained by the Cloud TPU service. When the requested resource becomes available, it's assigned to your Google Cloud project for your immediate exclusive use.
To create a TPU using queued resources, see Queued Resources.
If you will be using Multislice, see the Multislice introduction for more information.
When using Multislice, specify the following additional parameters when you request queued resources:
export NODE_COUNT=node_count export NODE_PREFIX=your_tpu_prefix # Optionalwhere
${NODE_COUNT} is the number of slices to create and ${NODE_PREFIX} is the prefix you specify to generate names for each slice. A number is appended to the prefix for each slice. For example if you set ${NODE_PREFIX} to mySlice, the slices are named: mySlice-0, mySlice-1, and so on.
Create a Cloud TPU using the Create Node API
To create a TPU using the Create Node API, you run
the gcloud compute tpus tpu-vm create
command.
To determine which TPU VM software you should use, see TPU VM images.
You can specify TPU configurations in terms of TensorCores or TPU chips. For more information, see the section for the TPU version you are using in System architecture.
The following command uses a TensorCore-based configuration:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=v4-8 \
--version=tpu-software-version
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
accelerator-type
- The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
- The TPU software version.
shielded-secure-boot
(optional)- Specifies that the TPU instances are created with secure boot enabled. This implicitly makes them Shielded VM instances. See What is Shielded VM? for more details.
The following command creates a TPU with a specific topology:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--type=v4 \
--topology=2x2x1 \
--version=tpu-software-version
Required flags
tpu-name
- The name of the TPU VM you are creating.
zone
- The zone where you are creating your Cloud TPU.
type
- The version of the Cloud TPU you want to create. For more information about TPU versions see TPU versions.
topology
- See the topology section for the supported topologies.
version
- The TPU software version you want to use. For more information, see TPU software versions
<aside> Details for all optional flags are shown in the
<a href="/sdk/gcloud/reference/compute/tpus/tpu-vm/create">
<code>gcloud</code> reference documentation</a>.</aside>
For more information on supported TPU types and topologies, see TPU versions.
Creating a Cloud TPU in the Google Cloud console
Go to the TPUs page:
Click CREATE TPU NODE.
In the Name box, type a TPU instance name.
In the Zone box, select the zone in which to create the TPU.
In the TPU type box, select the accelerator type you are using. The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
For TPU software version, select the software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. When creating a Cloud TPU, the TPU software version allows you to choose the ML framework installed on the TPU VM. No other settings are required. For more information, see Supported Models.
Click CREATE to create your resources.
Creating a Cloud TPU VM using curl
The following command uses curl
to create a TPU.
$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.16.1-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name
Required fields
runtime_version
- The Cloud TPU runtime version that you want to use.
project
- The name of your enrolled Google Cloud project.
zone
- The zone where you're creating your Cloud TPU.
node_name
- The name of the TPU VM you're creating.
Run a startup script
You can run a startup script on each TPU VM by specifying the
--metadata startup-script
parameter when creating the TPU VM. The following
command creates a TPU VM using a startup script.
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=tpu-type \
--version=tpu-vm-tf-2.16.1-pjrt \
--metadata startup-script='#! /bin/bash
pip3 install numpy
EOF'
Connecting to a Cloud TPU
You must explicitly connect to your TPU VM using SSH.
$ gcloud compute tpus tpu-vm ssh tpu-name --zone=zone
When you request slices with more than 4 chips, Cloud TPU creates a TPU VM for each group of 4 chips.
To install the binaries or run code, you can connect to each TPU VM using
the tpu-vm ssh command
.
$ gcloud compute tpus tpu-vm ssh tpu-name
To connect to a specific TPU VM or to install binaries on each TPU VM
using SSH, use the --worker
flag which follows a 0-based index:
$ gcloud compute tpus tpu-vm ssh ${TPU_NAME} --worker=1
To run a command on all TPU VMs with a single command, use the
--worker=all
and --command
flags to run a command on all TPU VMs
at the same time.
For example:
$ gcloud compute tpus tpu-vm ssh ${TPU_NAME}
--project=your_project_ID
--zone=zone
--worker=all
--command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html'
For Multislice, you can either run a command on a single VM
using the enumerated tpu-name,
each slice prefix and the number appended to it,
or use the --node=all
, --worker=all
,
and --command
flags
to run the command on all TPU VMs in all slices,
with an optional --batch-size
field.
$ gcloud compute tpus queued-resources ssh ${QUEUED_RESOURCE_ID}
--project=project_ID
--zone=zone
--node=all
--worker=all
--command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html'
--batch-size=4
Use SSH-in-browser by doing the following:
In the Google Cloud console, go to the TPUs page:
In the list of TPU VMs, click SSH in the row of the TPU VM that you want to connect to.
Listing your Cloud TPU resources
You can list all of your Cloud TPU in a specified zone.
Listing your Cloud TPU resources using gcloud
$ gcloud compute tpus tpu-vm list --zone=zone
This command lists the Cloud TPU resources in the specified zone. If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.
Listing your Cloud TPU resources in the Google Cloud console
All your provisioned TPUs show in the output.
Retrieving information about your Cloud TPU
You can retrieve information about a specified Cloud TPU.
Retrieving information about a Cloud TPU using gcloud
$ gcloud compute tpus tpu-vm describe tpu-name \
--zone=zone
Retrieving information about a Cloud TPU in the Google Cloud console
Go to the TPUs page:
Click the name of your Cloud TPU. The Cloud TPU detail page is displayed.
Stopping your Cloud TPU resources
You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software. Stopping TPU Pods or TPUs allocated through the queued resources API is not supported. To stop incurring charges for TPUs allocated through the queued resources API, you must delete the TPU.
Stopping a Cloud TPU using gcloud
$ gcloud compute tpus tpu-vm stop tpu-name
--zone=zone
Stopping a Cloud TPU in the Google Cloud console
Navigate to the Google Cloud console.
From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
Select the checkbox next to your Cloud TPU and click STOP from the menu bar at the top of the page.
Starting your Cloud TPU resources
You can start a Cloud TPU when it is stopped.
- Select the checkbox next to your Cloud TPU and click START from the menu bar at the top of the page.
Starting a Cloud TPU using gcloud
You can start a stopped Cloud TPU to resume using it.
$ gcloud compute tpus tpu-vm start tpu-name \
--zone=zone
Starting a Cloud TPU in the Google Cloud console
Navigate to the Google Cloud console.
Select the checkbox next to your Cloud TPU and click
START
from the menu bar at the top of the screen.
Deleting a Cloud TPU
Delete your TPU VM slices at the end of your session.
Deleting a Cloud TPU using gcloud
$ gcloud compute tpus tpu-vm delete ${TPU_NAME} \
--project=project-id
--zone=zone
--quiet
Command flag descriptions
zone
- The zone where you plan to delete your Cloud TPU.
Deleting a Cloud TPU in the Google Cloud console
Go to the TPUs page:
Select the checkbox next to your Cloud TPU and click Delete.
Advanced Configurations
Custom Network Resources
When you create the TPU, you can choose to specify the network and/or a
subnetwork. You can do this either by submitting a gcloud
command or a curl
call.
To specify the network or subnetwork in the gcloud
CLI, use:
--network [NETWORK] --subnetwork [SUBNETWORK]
To specify the network or subnetwork in a curl
call, use:
network_config: {network: '[NETWORK]', subnet: '[SUBNETWORK]', enable_external_ips: true}
Network
You can optionally specify the network to use for the TPU. If not specified,
the default
network is used.
Valid network formats:
https://www.googleapis.com/compute/{version}/projects/{proj-id}/global/networks/{network} compute/{version}/projects/{proj-id}/global/networks/{network} compute/{version}/projects/{proj-##}/global/networks/{network} projects/{proj-id}/global/networks/{network} projects/{proj-##}/global/networks/{network} global/networks/{network} {network}
Subnetwork
You can specify the subnetwork to use a specific subnetwork. The specified subnetwork needs to be in the same region as the zone where the TPU runs.
Valid Formats:
https://www.googleapis.com/compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} compute/{version}/projects/{proj-##}/regions/{region}/subnetworks/{subnetwork} projects/{proj-id}/regions/{region}/subnetworks/{subnetwork} projects/{proj-##}/regions/{region}/subnetworks/{subnetwork} regions/{region}/subnetworks/{subnetwork} {subnetwork}
Private Google Access
In order to SSH into the TPU VMs, you need to either add access configs for the TPU VMs, or turn on the Private Google Access for the subnetwork to which the TPU VMs are connected.
To add access configs, enable_external_ips
must be set. When you
create a TPU,
enable_external_ips
is set by default. If you want to opt out, specify the
following command:
--internal-ips
Or use a curl
call:
network_config: {enable_external_ips: true}
After you have configured Private Google Access, connect to the VM via SSH.
Custom Service Account
Each TPU VM has an associated service account it uses to make API requests on your behalf. TPU VMs use this service account to call Cloud TPU APIs, access Cloud Storage and other services. By default, your TPU VM uses the default Compute Engine service account.
You can specify a custom service account when creating a TPU VM using the
--service-account
flag. The service account must be defined in the same
Google Cloud project where you create your TPU VM. Custom service accounts used
for TPU VMs must have the TPU Viewer
role to call the Cloud TPU API. If the code running in your TPU VM calls other
Google Cloud services, it must have the roles necessary to access those services.
When you create a TPU, you can choose to specify a custom service account
using the --service-account
flag. For more information about service accounts,
see Service Accounts.
Use the following commands to specify a custom service account.
Create a TPU VM using the gcloud
CLI
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=tpu-type \
--version=tpu-vm-tf-2.16.1-pjrt \
--service-account=your-service-account
Create a TPU VM using curl
$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.16.1-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
service_account: {email: 'your-service-account'} \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name
To use a custom service account, you need to authorize the service account for your Google Cloud Storage buckets. For more information, see Connecting to Cloud Storage buckets.
Custom VM SSH methods
Set up a firewall for SSH.
The default network comes preconfigured to allow SSH access to all VMs. If you don't use the default network, or you have changed the default network settings, you may need to explicitly enable SSH access by adding a firewall-rule:
$ gcloud CLI compute firewall-rules create \ --network=network allow-ssh \ --allow=tcp:22
Connect to the TPU VMs using SSH.
$ gcloud compute tpus tpu-vm ssh tpu-name \ --zone=us-central2-b \ --project=project-id
Required fields
tpu-name
: Name of the TPU VM.zone
: The zone where you created the TPU VM.project-id
: The name of your Google Cloud project.
For a list of optional fields, see the
gcloud
API documentation.