Creating and deleting TPUs

Overview

Running a Machine Learning (ML) model requires a Compute Engine VM and Cloud TPU resources. This page describes how to manage these resources using:

  • The ctpu utility, which provides a CLI specifically designed for managing Cloud TPU resources
  • The gcloud commands command line tool, which provides the primary CLI to Google Cloud Platform (GCP)
  • The GCP Console, which provides an integrated management console for your GCP resources.

Setting up a Compute Engine VM and Cloud TPU resources

You can allocate and start your VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the following command in Cloud Shell. The ctpu utility creates the Compute Engine VM and Cloud TPU resources together and gives them the same name.

$ ctpu up [optional: --name --zone --tpu-size --machine-type --disk-size-gb]

gcloud commands

The Cloud SDK is a set of tools that you can use to interact with GCP in Cloud Shell.

  1. Install the gcloud command-line tool via the Cloud SDK.
  2. Use the gcloud command-line tool to specify your GCP project:
    $ gcloud config set project [YOUR-CLOUD-PROJECT]
    
  3. Specify the zone where you plan to create your Compute Engine VM and Cloud TPU resource. For this example, use the us-central1-b zone:
    $ gcloud config set compute/zone [YOUR-ZONE]
    

    For reference, Cloud TPU is available in the following zones:

    US

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB us-central1-a
    us-central1-b
    us-central1-c
    (us-central1-f TFRC only)
    v2-32 (Beta) 32 256 GiB us-central1-a
    v2-128 (Beta) 128 1 TiB us-central1-a
    v2-256 (Beta) 256 2 TiB us-central1-a
    v2-512 (Beta) 512 4 TiB us-central1-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB us-central1-a
    us-central1-b
    (us-central1-f TFRC only)

    Europe

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB europe-west4-a
    v2-32 (Beta) 32 256 GiB europe-west4-a
    v2-128 (Beta) 128 1 TiB europe-west4-a
    v2-256 (Beta) 256 2 TiB europe-west4-a
    v2-512 (Beta) 512 4 TiB europe-west4-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB europe-west4-a
    v3-32 (Beta) 32 512 GiB europe-west4-a
    v3-64 (Beta) 64 1 TiB europe-west4-a
    v3-128 (Beta) 128 2 TiB europe-west4-a
    v3-256 (Beta) 256 4 TiB europe-west4-a
    v3-512 (Beta) 512 8 TiB europe-west4-a
    v3-1024 (Beta) 1024 16 TiB europe-west4-a
    v3-2048 (Beta) 2048 32 TiB europe-west4-a

    Asia Pacific

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB asia-east1-c
  4. Create a Compute Engine VM to interact with your Cloud TPU.

    Since we specified the zone in the previous command, the VM instance is created in that zone.

    $ gcloud compute instances create [YOUR-VM/TPU-NAME]\
       --zone=[YOUR_ZONE]
       --machine-type=n1-standard-1 \
       --image-project=ml-images \
       --image-family=tf-1-14 \
       --scopes=cloud-platform
    

    This will generate output similar to the following:

    NAME         ZONE           MACHINE_TYPE    PREEMPTIBLE INTERNAL_IP  EXTERNAL_IP    STATUS
    demo-vm-tpu  us-central1-b  n1-standard-1               10.138.0.2   35.247.15.162  RUNNING
    
  5. Create a new Cloud TPU resource.
    $ gcloud compute tpus create [YOUR-VM/TPU-NAME] \
          --zone=[YOUR_ZONE] \
          --network=[YOUR_NETWORK_ID or default] \
          --accelerator-type=[YOUR-TPU-VERSION] \
          --range=[RANGE] \ # for example 192.168.0.0/29 \
          --version=1.14
    

    This will generate output similar to the following:

    NAME         ZONE           ACCELERATOR_TYPE NETWORK_ENDPOINT  NETWORK  RANGE         STATUS
    demo-vm-tpu  us-central1-b  v2-8             10.240.1.2:8470   default  10.240.1.0/29 READY
    
  6. Remotely connect to your Compute Engine VM:
    $ gcloud compute ssh --zone=[YOUR_ZONE] [YOUR-VM/TPU-NAME]
    
  7. Create an environment variable containing the name of your TPU:
    $ export TPU_NAME=[YOUR-VM/TPU-NAME]
    

Console

  1. Create and start your VM.
    1. Go to Compute Engine > VM instances on the left-hand navigation bar and click CREATE INSTANCE.
    2. On the Create an instance page specify an instance name, the region, and the machine type.
  2. Create, start and connect to your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar and click CREATE TPU NODE.
    2. On the Create a Cloud TPU page specify the TPU resource name, the zone, and an internal IP address for the Cloud TPU to use.
    3. Go to Compute Engine > VM instances. Find the instance with your VM name, and click SSH to connect to it.

Stopping your Cloud TPU resources

Charges for Cloud TPU resources begin when the Cloud TPU starts, even if it is not yet actively training a model. To avoid being charged while the Cloud TPU is inactive, you can stop it and restart it when you are ready to train a model.

This section shows how to stop the Cloud TPU using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

  1. Run the ctpu status command, specifying the zone where your Cloud TPU is set up.

    $ ctpu status --zone=[YOUR-ZONE]

    This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

    Your cluster is running!
    Compute Engine VM:  RUNNING
    Cloud TPU:          RUNNING
    
  2. If the Cloud TPU resource is running, use the following command to stop it.

    $ ctpu pause --zone=[YOUR-ZONE]

    This stops the Compute Engine VM and deletes the Cloud TPU resources in the specified zone. To only restart the Compute Engine VM, run the following command, including any other flags you need for the VM:

    $ ctpu up --zone=[YOUR-ZONE] --vm-only

    To restart the Cloud TPU, run ctpu up with the flags you set when you first started the Cloud TPU.

gcloud

  1. Run the following command in Cloud Shell to list the available Cloud TPU resources in your zone.

    $ gcloud compute tpus list --zone=[YOUR-ZONE]
    

    The Cloud TPU resource in your zone is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    
  2. Run the following command to stop the Cloud TPU:

    $ gcloud compute tpus stop [YOUR-TPU-NAME] --zone=[YOUR-ZONE]
    

Console

  1. Select Compute Engine > TPUs from the left-hand navigation bar. Click STOP from the menu bar at the top of the page.
  2. To restart the Cloud TPU, click START.

Viewing your Compute Engine VM and Cloud TPU resources

This section shows how you can view your currently active VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the ctpu status command and specify the zone where your Compute Engine VM and Cloud TPU resources are set up.

$ ctpu status --zone=[YOUR-ZONE]

This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

Your cluster is running!
Compute Engine VM:  RUNNING
Cloud TPU:          RUNNING

If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.

gcloud

  1. Run the following command in Cloud Shell to list the available Compute Engine VM resources in specific zones. In this example, VM resources in us-central1-b and europe-west4-a will be displayed:

    $ gcloud compute instances list --filter="zone:( us-central1-b europe-west4-a )"
    

    The above command prints the details of the VM resources you've created. For example:

    NAME      ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
    demo-tpu  us-central1-b  n1-standard-1               10.128.0.33  35.232.214.205  RUNNING
    
    
  2. Run the following command from Cloud Shell to list the available Cloud TPU resources in your zone. In this example we show the selected zone to be us-central1-b.

    $ gcloud compute tpus list --zone=us-central1-b
    

    The Cloud TPU resource in us-central1-b is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    

Console

  1. From the left navigation menu, select Compute Engine > TPUs.

    A list of all active Compute Engine resources appears.

  2. From the left navigation menu, select Compute Engine > VM Instances.

    A list of all active Compute Engine resources appears.

Deleting your Compute Engine VM and Cloud TPU resources

You can delete your VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the following command [Cloud Shell](/shell/). The ctpu utility deletes the Compute Engine VM and Cloud TPU resources together.

$ ctpu delete [optional: --zone]

gcloud

Run the following command from your Cloud Shell to delete your Cloud TPU and Compute Engine VM resources. You must include the Cloud TPU resource name, the zone, and the project ID on the command line. Delete the Cloud TPU first since you need a running Compute Engine VM to delete the Cloud TPU resource.

  1. Delete your Cloud TPU resource:
    $ gcloud compute tpus delete [YOUR-VM/TPU-NAME] --project=[YOUR-CLOUD-PROJECT] --zone=[YOUR-ZONE]
    
  2. Delete your Compute Engine instance:
    $ gcloud compute instances delete [YOUR-VM/TPU-NAME] --project=[YOUR-CLOUD-PROJECT] --zone=[YOUR-ZONE]
    
  3. Delete the VPC network that Google automatically created as part of the Cloud TPU setup.

    Go to the VPC Networking page on the GCP Console.

  4. Select the VPC network. The network entry starts with cp-to-tp-peering in the ID.

    At the top of the page, click Delete to delete the selected VPC network.

  5. When you've finished deleting the resources, use the gsutil command to delete any Cloud Storage buckets you created. Replace YOUR-BUCKET-NAME with the name of your Cloud Storage bucket:
    $ gsutil rm -r gs://[YOUR-BUCKET-NAME]
    

Console

  1. Delete your VM.
    1. Go to Compute Engine > VM instances from the left-hand navigation
    2. bar.
    3. Select your VM instance from the list. Click the trash can icon at the top of the page.
  2. Delete your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar.
    2. Select your TPU resource from the list. Click the trash can icon at the top of the page.
Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Trenger du hjelp? Gå til brukerstøttesiden vår.