Manage TPU resources
This page describes how to create, list, stop, start, delete, and connect to
Cloud TPUs using the Create Node API. The Create Node API is called when you
run the gcloud compute tpus tpu-vm create
command using the Google Cloud CLI
and when you create a TPU using the Google Cloud console. When you use the Create
Node API, your request is processed immediately. If there is not enough capacity
to fulfill your request, then the request will fail.
The best practice is to create TPUs using queued resources instead of the Create Node API. When you request queued resources, the request is added to a queue maintained by the Cloud TPU service. When the requested resource becomes available, it's assigned to your Google Cloud project for your immediate exclusive use. For more information, see Manage queued resources.
When using Multislice, you must used queued resources. For more information, see Multislice introduction.
If you want to use Google Kubernetes Engine (GKE) to manage TPU resources, you first have to create a GKE cluster. You then add node pools containing TPU slices to your cluster. For more information, see About TPUs in GKE.
Prerequisites
Before you run these procedures, you must install the Google Cloud CLI, create a Google Cloud project, and enable the Cloud TPU API. For instructions, see Set up the Cloud TPU environment.
If you are using the Google Cloud CLI, you can run commands using the
Cloud Shell, a Compute Engine VM, or your local machine. The
Cloud Shell lets you interact with Cloud TPUs without having
to install any software. The Cloud Shell disconnects after a period of
inactivity. If you're running long-running commands, we recommend installing the
Google Cloud CLI on your local machine. For more information on the
Google Cloud CLI, see the gcloud
Reference.
Create a Cloud TPU using the Create Node API
You can create a Cloud TPU using gcloud
, the Google Cloud console,
or the Cloud TPU API.
When creating a Cloud TPU, you must specify the TPU VM image (also called TPU software version). To determine which VM image you should use, see TPU VM images.
You also need to specify the TPU configuration in terms of TensorCores or TPU chips. For more information, see the section for the TPU version you are using in System architecture.
gcloud
To create a TPU using the Create Node API, use
the gcloud compute tpus tpu-vm create
command.
To configure specific internal or external IP addresses, see the instructions in
External and internal IP addresses.
The following command uses a v4-8 TPU configuration:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=v4-8 \
--version=tpu-software-version
Command flag descriptions
zone
- The zone where you plan to create your Cloud TPU.
accelerator-type
- The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
- The TPU software version.
shielded-secure-boot
(optional)- Specifies that the TPU instances are created with secure boot enabled. This implicitly makes them Shielded VM instances. See What is Shielded VM? for more details.
The following command creates a TPU with a specific topology:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--type=v4 \
--topology=2x2x1 \
--version=tpu-software-version
Required flags
tpu-name
- The name of the TPU VM you are creating.
zone
- The zone where you are creating your Cloud TPU.
type
- The TPU version you want to use. For more information, see TPU versions.
topology
- The physical arrangement of TPU chips, specifying the number of chips in each dimension. For more information about supported topologies for each TPU version, see TPU versions.
version
- The TPU software version you want to use. For more information, see TPU software versions.
Console
In the Google Cloud console, go to the TPUs page:
Click Create TPU.
In the Name field, enter a name for your TPU.
In the Zone box, select the zone in which to create the TPU.
In the TPU type box, select an accelerator type. The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
In the TPU software version box, select a software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. For more information, see TPU VM images.
Click Create to create your resources.
curl
The following command uses curl
to create a TPU.
$ curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -d "{accelerator_type: 'v4-8', \
runtime_version:'tpu-vm-tf-2.18.0-pjrt', \
network_config: {enable_external_ips: true}, \
shielded_instance_config: { enable_secure_boot: true }}" \
https://tpu.googleapis.com/v2/projects/project-id/locations/us-central2-b/nodes?node_id=node_name
Required fields
runtime_version
- The Cloud TPU runtime version that you want to use.
project
- The name of your enrolled Google Cloud project.
zone
- The zone where you're creating your Cloud TPU.
node_name
- The name of the TPU VM you're creating.
Run a startup script
You can run a startup script on each TPU VM by specifying the
--metadata startup-script
flag when creating the TPU VM. The following
command creates a TPU VM using a startup script.
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=tpu-type \
--version=tpu-vm-tf-2.18.0-pjrt \
--metadata startup-script='#! /bin/bash
pip3 install numpy
EOF'
Connect to a Cloud TPU
gcloud
Connect to your Cloud TPU using SSH:
$ gcloud compute tpus tpu-vm ssh tpu-name --zone=zone
When you request a slice larger than a single host, Cloud TPU creates a TPU VM for each host. The number of TPU chips per host depends on the TPU version.
To install binaries or run code, connect to each TPU VM using
the tpu-vm ssh command
.
$ gcloud compute tpus tpu-vm ssh tpu-name
To connect to a specific TPU VM
using SSH, use the --worker
flag which follows a 0-based index:
$ gcloud compute tpus tpu-vm ssh tpu-name --worker=1
To run a command on all TPU VMs with a single command, use the
--worker=all
and --command
flags:
$ gcloud compute tpus tpu-vm ssh tpu-name \
--project=your_project_ID \
--zone=zone \
--worker=all \
--command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html'
For Multislice, you can run a command on a single VM
using the enumerated TPU name, with
each slice prefix and the number appended to it. To run a
command on all TPU VMs in all slices, use the --node=all
, --worker=all
,
and --command
flags, with an optional
--batch-size
flag.
$ gcloud compute tpus queued-resources ssh ${QUEUED_RESOURCE_ID} \
--project=project_ID \
--zone=zone \
--node=all \
--worker=all \
--command='pip install "jax[tpu]==0.4.20" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html' \
--batch-size=4
Console
To connect to your TPUs in the Google Cloud console, use SSH-in-browser:
In the Google Cloud console, go to the TPUs page:
In the list of TPU VMs, click SSH in the row of the TPU VM that you want to connect to.
List your Cloud TPU resources
You can list all of your Cloud TPUs in a specified zone.
gcloud
$ gcloud compute tpus tpu-vm list --zone=zone
Console
In the Google Cloud console, go to the TPUs page:
Retrieve information about your Cloud TPU
You can retrieve information about a specified Cloud TPU.
gcloud
$ gcloud compute tpus tpu-vm describe tpu-name \
--zone=zone
Console
In the Google Cloud console, go to the TPUs page:
Click the name of your Cloud TPU. The console displays the Cloud TPU detail page.
Stop your Cloud TPU resources
You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software.
gcloud
$ gcloud compute tpus tpu-vm stop tpu-name \
--zone=zone
Console
In the Google Cloud console, go to the TPUs page:
Select the checkbox next to your Cloud TPU.
Click
Stop.
Start your Cloud TPU resources
You can start a Cloud TPU when it is stopped.
gcloud
$ gcloud compute tpus tpu-vm start tpu-name \
--zone=zone
Console
In the Google Cloud console, go to the TPUs page:
Select the checkbox next to your Cloud TPU.
Click
Start.
Delete a Cloud TPU
Delete your TPU VM slices at the end of your session.
gcloud
$ gcloud compute tpus tpu-vm delete tpu-name \
--project=project-id \
--zone=zone \
--quiet
Command flag descriptions
zone
- The zone where you plan to delete your Cloud TPU.
Console
In the Google Cloud console, go to the TPUs page:
Select the checkbox next to your Cloud TPU.
Click
Delete.