Stay organized with collections Save and categorize content based on your preferences.

Managing TPUs

You can use the gcloud commands described in this document with both TPU configurations: TPU VMs and TPU Nodes. The gcloud commands you use depend on the TPU configuration you are using. Each gcloud command is shown in a tabbed section. Choose the tab for the TPU configuration you want to use and the web page shows the appropriate gcloud command. Unless you know you need to use TPU Nodes, we recommend using TPU VMs. For more information about TPU configurations, see System Architecture.

Running a Machine Learning (ML) model requires a Compute Engine VM and Cloud TPU resources. This page describes how to manage these resources using:

Prerequisites

To run these procedures, you need to have a Google Cloud project set up. If you don't have a project, see Creating and managing projects to set one up.

If you are using the Google Cloud CLI you can use the Google Cloud Shell, a Compute Engine VM or install the Google Cloud CLI locally. The Google Cloud Shell allows you to interact with Cloud TPUs without having to install any software. The Google Cloud Shell may disconnect after a period of inactivity. If you're running long-running commands, we recommend installing the Google Cloud CLI on your local machine.

  1. Install the Google Cloud CLI.
  2. Configure gcloud to use your project.

    gcloud config set project project-name
    
  3. Configure gcloud to use the zone where you plan to create your Cloud TPU resources. For example, us-central1-b.

    $ gcloud config set compute/zone zone
    

For more information on the gcloud command, see the gcloud Reference.

Creating a Cloud TPU

When you create a Cloud TPU, you create Compute Engine VM and TPU resources.

Creating a Cloud TPU with gcloud

If you want to use the Cloud Shell, click Open Cloud Shell. Otherwise, open a command prompt/terminal window on your local computer.

Create your Cloud TPU resources. The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=zone \
  --accelerator-type=v3-8 \
  --version=tpu-vm-tf-2.12.0

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.

accelerator-type
The type of the Cloud TPU to create.

version
The Cloud TPU software version.
shielded-secure-boot (optional)
Specifies that the TPU instances are created with secure boot enabled. This implicitly makes them Shielded VM instances. See What is shielded VM? for more details.

TPU Nodes

$ gcloud compute tpus execution-groups create --name=tpu-name \
  --zone=zone \
  --tf-version=2.12.0 \
  --machine-type=n1-standard-1 \
  --accelerator-type=v3-8

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.

tf-version
The version of Tensorflow the gcloud command installs on your VM.

machine-type
The machine type of the Compute Engine VM to create.

accelerator-type
The type of the Cloud TPU to create.

Creating a Cloud TPU queued resource with gcloud

Using gcloud you can also create a Cloud TPU as a queued resource. When you make a request for a queued resource, your request is added to a queue managed by Cloud TPU. When a resource becomes available, the resource is allocated and is available for your exclusive use. For more information see, Cloud TPU queued resources.

Run standard installation scripts

You can run a startup script on each TPU VM by specifying the --metadata startup-script parameter when creating the TPU VM. The following is an example of using a startup-script for TPU VM.

$ gcloud compute tpus tpu-vm create tpu-name \
    --zone=zone \
    --accelerator-type=tpu-type \
    --version=tpu-vm-tf-2.12.0 \
    --metadata startup-script='#! /bin/bash
      pip3 install numpy
      EOF'

After the TPU VM is created, you can view the logs from the startup script by connecting to the TPU VM using SSH and running:

$ cat /var/log/syslog | grep startup-script

Creating a Cloud TPU in the Google Cloud console

  1. Navigate to the Google Cloud console.
  2. From the navigation menu, select Compute Engine > TPUs.
  3. Click CREATE TPU NODE.
  4. In the Name box, type a TPU instance name.
  5. In the Zone box, select the zone in which to create the TPU.
  6. Under TPU Configuration, select either TPU VM or TPU Node. The TPU configuration determines whether you create the TPU as a TPU VM or a TPU Node. For more information, see System Architecture.
  7. For TPU type, select the TPU type you want to create.
  8. For TPU software version, select the software version. When creating a Cloud TPU VM, the TPU software version specifies the version of the TPU runtime to install. When creating a Cloud TPU Node, the TPU software version allows you to choose the ML framework installed on the node's VM. No other settings are required. For more information, see Supported Models.
  9. Click CREATE to create your resources.

Connecting to a Cloud TPU VM

By default, the gcloud command you use to create TPU Nodes automatically attempts to SSH into your TPU node. If you are using TPU Nodes and are not connected to the Compute Engine instance by the gcloud command, you can connect by running the following TPU Nodes command. When using TPU VMs you must explicitly SSH into your TPU using the following TPU VM command.

TPU VMs

$ gcloud compute tpus tpu-vm ssh tpu-name \
  --zone=zone

TPU Nodes

$ gcloud compute ssh tpu-name\
  --zone=zone

Listing your Cloud TPU resources

You can list all of your Cloud TPU in a specified zone.

Listing your Cloud TPU resources using gcloud

The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm list --zone=zone

TPU Nodes

$ gcloud compute tpus execution-groups list --zone=zone

This command lists the Cloud TPU resources in the specified zone. If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.

Listing your Cloud TPU resources using the Google Cloud console

  1. Navigate to the Google Cloud console.

  2. From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.

Retrieving information about your Cloud TPU

You can retrieve information about a specified Cloud TPU.

Retrieve information about a Cloud TPU using gcloud

The commands you use depend on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm describe tpu-name \
  --zone=zone

TPU Nodes

$ gcloud compute tpus execution-groups describe tpu-name \
  --zone=zone

Retrieve information about a Cloud TPU using the Google Cloud console

  1. Navigate to the Google Cloud console.
  2. From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.
  3. Click the name of your Cloud TPU, the Cloud TPU detail page is displayed.

Stopping your Cloud TPU resources

You can stop a single Cloud TPU to stop incurring charges without losing your VM's configuration and software. Stopping TPU Pods or TPUs allocated through the queued resources API is not supported. To stop incurring charges for TPUs allocated through the queued resources API, you must delete the TPU.

Stopping a Cloud TPU with gcloud

The command you use for stopping a Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm stop tpu-name \
--zone=zone

TPU Nodes

$ gcloud compute tpus stop tpu-name \
--zone=zone

Stopping a Cloud TPU in the Google Cloud console

  1. Navigate to the Google Cloud console.

  2. From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.

  3. Select the checkbox next to your Cloud TPU and click Stop.

Starting your Cloud TPU resources

You can start a Cloud TPU when it is stopped.

Starting a Cloud TPU with gcloud

You can start a stopped Cloud TPU to resume using it.

The command you use for starting a stopped Cloud TPU depend on whether you are using TPU VMs or TPU Nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm start tpu-name --zone=zone

TPU Nodes

$ gcloud compute tpus start tpu-name --zone=zone

Starting a Cloud TPU in the Google Cloud console

  1. Navigate to the Google Cloud console.

  2. From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.

  3. Select the checkbox next to your Cloud TPU and click Start.

Deleting your Compute Engine VM and Cloud TPU resources

You can delete your Cloud TPU when you are done using them.

Deleting a Cloud TPU using gcloud

The command you use depends on whether you are using TPU VMs or TPU nodes. For more information, see System Architecture.

TPU VMs

$ gcloud compute tpus tpu-vm delete tpu-name \
  --zone=zone

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.

TPU Nodes

$ gcloud compute tpus execution-groups delete tpu-name \
  --zone=zone

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.

Deleting a Cloud TPU using the Google Cloud console

  1. Navigate to the Google Cloud console.

  2. From the navigation menu, select Compute Engine > TPUs. The console displays the TPUs page.

  3. Select the checkbox next to your Cloud TPU and click Delete.

Advanced Configurations

Custom Network Resources

When you create the TPU, you can choose to specify the network and/or a subnetwork. You can do this either by submitting a gcloud command or a curl call.

To specify the network or subnetwork in the gcloud CLI, use:

--network [NETWORK] --subnetwork [SUBNETWORK]

To specify the network or subnetwork in a curl call, use:

network_config: {network: '[NETWORK]', subnet: '[SUBNETWORK]', enable_external_ips: true}

Network

You can optionally specify the network to use for the TPU. If not specified, the default network is used.

Valid network formats:

https://www.googleapis.com/compute/{version}/projects/{proj-id}/global/networks/{network}
compute/{version}/projects/{proj-id}/global/networks/{network}
compute/{version}/projects/{proj-##}/global/networks/{network}
projects/{proj-id}/global/networks/{network}
projects/{proj-##}/global/networks/{network}
global/networks/{network}
{network}

Subnetwork

You can specify the subnetwork to use a specific subnetwork. The specified subnetwork needs to be in the same region as the zone where the TPU runs.

Valid Formats:

https://www.googleapis.com/compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
compute/{version}/projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
compute/{version}/projects/{proj-##}/regions/{region}/subnetworks/{subnetwork}
projects/{proj-id}/regions/{region}/subnetworks/{subnetwork}
projects/{proj-##}/regions/{region}/subnetworks/{subnetwork}
regions/{region}/subnetworks/{subnetwork}
{subnetwork}

Private Google Access

In order to SSH into the TPU VMs, you need to either add access configs for the TPU VMs, or turn on the Private Google Access for the subnetwork to which the TPU VMs are connected.

To add access configs, enable_external_ips must be set. When you create a TPU, enable_external_ips is set by default. If you want to opt out, specify the following command:

--internal-ips

Or use a curl call:

network_config: {enable_external_ips: true}

After you have configured Private Google Access, connect to the VM via SSH.

Custom Service Account

Each TPU VM has an associated service account it uses to make API requests on your behalf. TPU VMs use this service account to access files on Cloud Storage and access other services. See Service Accounts to learn more about them.

When you create a TPU node, you can choose to specify a custom service account for the TPU VM identities. By default, Google Compute Engine default service account is used. A custom service account needs to be in the project that you use to create the TPU. Use the following commands to specify a custom service account.

Specify in the gcloud CLI:

--service-account=[SERVICE_ACCOUNT]

Specify using curl:

service_account: {email: '[SERVICE_ACCOUNT]'}

To use a custom service account, you need to authorize the service account for your Google Cloud Storage buckets. See Connecting to Cloud Storage Buckets for instructions.

Custom VM SSH methods

  1. Set up a firewall for SSH

    The default network comes preconfigured to allow SSH access to all VMs. If you don't use the default network, or the default network was edited, you may need to explicitly enable SSH access by adding a firewall-rule:

    $ gcloud compute firewall-rules create \
    --network=NETWORK allow-ssh \
    --allow=tcp:22
    
  2. SSH into the TPU VMs

    $ gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
    --zone ${ZONE} \
    --project ${PROJECT_ID}
    

    Required fields

    • TPU_NAME: Name of the TPU node.
    • ZONE: The location of the TPU node. Currently, only us-central2-b is supported.
    • PROJECT_ID: The project you created above.

    See the gcloud API documentation for a list of optional fields.