Managing VM and TPU Resources

Overview

Running a Machine Learning (ML) model requires a Compute Engine VM and Cloud TPU resources. This page describes how to manage these resources using:

  • The ctpu utility, which provides a CLI specifically designed for managing Cloud TPU resources
  • The gcloud commands command line tool, which provides the primary CLI to Google Cloud Platform (GCP)
  • The Cloud Console, which provides an integrated management console for your GCP resources.

Setting up a Compute Engine VM and Cloud TPU resources

You can allocate and start your VM and TPU resources using the ctpu utility, gcloud commands, or the Cloud Console.

ctpu

Run the following command from a Cloud Shell. The `ctpu` utility creates the Compute Engine VM and Cloud TPU resources together and gives them the same name.

$ ctpu up [optional: --name --zone --tpu-size --machine-type --disk-size-gb]

gcloud commands

The Cloud SDK is a set of tools that you can use to interact with GCP from the Cloud Shell command line.

  1. Install the `gcloud` command-line tool via the Cloud SDK.
  2. Use the gcloud command-line tool to specify your GCP project:
    $ gcloud config set project [YOUR-CLOUD-PROJECT]
    
  3. Specify the zone where you plan to create your Compute Engine VM and Cloud TPU resource. For this example, use the `us-central1-b` zone:
    $ gcloud config set compute/zone [YOUR-ZONE]
    

    For reference, Cloud TPU is available in the following zones:

    US

    Cloud TPU v2 and Preemptible v2 us-central1-b
    us-central1-c
    us-central1-f ( TFRC program only)
    Cloud TPU v3 and Preemptible v3 us-central1-a
    us-central1-b
    us-central1-f
    ( TFRC program only)
    Cloud TPU v2 Pod (beta) us-central1-a

    Europe

    Cloud TPU v2 and Preemptible v2 europe-west4-a
    Cloud TPU v3 and Preemptible v3 europe-west4-a
    Cloud TPU v2 Pod (beta) europe-west4-a
    Cloud TPU v3 Pod (beta) europe-west4-a

    Asia Pacific

    Cloud TPU v2 and Preemptible v2 asia-east1-c

  4. Create a Compute Engine VM to interact with your Cloud TPU.

    Since we specified the zone in the previous command, the VM instance is created in that zone.

    $ gcloud compute instances create [YOUR-VM/TPU-NAME]\
       --zone=[YOUR_ZONE]
       --machine-type=n1-standard-2 \
       --image-project=ml-images \
       --image-family=tf-1-13 \
       --scopes=cloud-platform
    

    This will generate output similar to the following:

    NAME         ZONE           MACHINE_TYPE    PREEMPTIBLE INTERNAL_IP  EXTERNAL_IP    STATUS
    demo-vm-tpu  us-central1-b  n1-standard-1               10.138.0.2   35.247.15.162  RUNNING
    
  5. Create a new Cloud TPU resource. :
    $ gcloud compute tpus create [YOUR-VM/TPU-NAME] \
          --zone=[YOUR_ZONE] \
          --network=[YOUR_NETWORK_ID or default] \
          --accelerator-type=[YOUR-TPU-VERSION] \
          --range=[RANGE] \ # for example 192.168.0.0/29 \
          --version=1.13
    

    This will generate output similar to the following:

    NAME         ZONE           ACCELERATOR_TYPE NETWORK_ENDPOINT  NETWORK  RANGE         STATUS
    demo-vm-tpu  us-central1-b  v2-8             10.240.1.2:8470   default  10.240.1.0/29 READY
    
  6. Remotely connect to your Compute Engine VM:
    $ gcloud compute ssh --zone=[YOUR_ZONE] [YOUR-VM/TPU-NAME]
    
  7. Create an environment variable containing the name of your TPU:
    $ export TPU_NAME=[YOUR-VM/TPU-NAME]
    

Cloud console

  1. Create and start your VM.
    1. Go to Compute Engine > VM instances on the left-hand navigation bar and click CREATE INSTANCE.
    2. On the Create an instance page specify an instance name, the region, and the machine type.
  2. Create, start and connect to your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar and click CREATE TPU NODE.
    2. On the Create a Cloud TPU page specify the TPU resource name, the zone, and an internal IP address for the Cloud TPU to use.
    3. Go to Compute Engine > VM instances. Find the instance with your VM name, and click SSH to connect to it.

Stopping your Cloud TPU resources

Charges for Cloud TPU resources begin when the Cloud TPU starts, even if it is not yet actively training a model. To avoid being charged while the Cloud TPU is inactive, you can stop it and restart it when you are ready to train a model.

This section shows how to stop the Cloud TPU using the ctpu utility, gcloud commands, or the Cloud Console.

ctpu

  1. Run the `ctpu status` command, specifying the zone where your Cloud TPU is set up.

    $ ctpu status --zone=[YOUR-ZONE]

    This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

    Your cluster is running!
    Compute Engine VM:  RUNNING
    Cloud TPU:          RUNNING
    
  2. If the Cloud TPU resource is running, use the following command to stop it.

    $ ctpu pause --zone=[YOUR-ZONE]

    This stops the Compute Engine VM and deletes the Cloud TPU resources in the specified zone. To only restart the Compute Engine VM, run the following command, including any other flags you need for the VM:

    $ ctpu up --zone=[YOUR-ZONE] --vm-only

    To restart the Cloud TPU, run ctpu up with the flags you set when you first started the Cloud TPU.

gcloud command

  1. Run the following command from your Cloud Shell to list the available Cloud TPU resources in your zone.

    $ gcloud compute tpus list --zone=[YOUR-ZONE]
    

    The Cloud TPU resource in your zone is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    
  2. Run the following command to stop the Cloud TPU:

    $ gcloud compute tpus stop [YOUR-TPU-NAME] --zone=[YOUR-ZONE]
    

Cloud console

  1. Select Compute Engine > TPUs from the left-hand navigation bar. Click STOP from the menu bar at the top of the page.
  2. To restart the Cloud TPU, click START.

Viewing your Compute Engine VM and Cloud TPU resources

This section shows how you can view your currently active VM and TPU resources using the ctpu utility, gcloud commands, or the Cloud Console.

ctpu

Run the `ctpu status` command and specify the zone where your Compute Engine VM and Cloud TPU resources are set up.

$ ctpu status --zone=[YOUR-ZONE]

This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

Your cluster is running!
Compute Engine VM:  RUNNING
Cloud TPU:          RUNNING

If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.

gcloud command

  1. Run the following command from your Cloud Shell to list the available Compute Engine VM resources in specific zones. In this example, VM resources in us-central1-b and europe-west4-a will be displayed:

    $ gcloud compute instances list --filter="zone:( us-central1-b europe-west4-a )"
    

    The above command prints the details of the VM resources you've created. For example:

    NAME      ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
    demo-tpu  us-central1-b  n1-standard-1               10.128.0.33  35.232.214.205  RUNNING
    
    
  2. Run the following command from your Cloud Shell to list the available Cloud TPU resources in your zone. In this example we show the selected zone to be us-central1-b.

    $ gcloud compute tpus list --zone=us-central1-b
    

    The Cloud TPU resource in us-central1-b is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    

Cloud console

  1. From the left navigation menu, select Compute Engine > TPUs.

    A list of all active Compute Engine resources appears.

  2. From the left navigation menu, select Compute Engine > VM Instances.

    A list of all active Compute Engine resources appears.

Deleting your Compute Engine VM and Cloud TPU resources

You can delete your VM and TPU resources using the ctpu utility, gcloud commands, or the Cloud Console.

ctpu

Run the following command from a Cloud Shell. The `ctpu` utility deletes the Compute Engine VM and Cloud TPU resources together.

$ ctpu delete [optional: --zone]

gcloud commands

If you haven't set the project and zone for this session, do so now.

(vm)$ gcloud config set project [YOUR-CLOUD-PROJECT]
(vm)$ gcloud config set compute/zone [YOUR-ZONE]

Then follow this cleanup procedure:

  1. Exit from the Compute Engine VM instance:
    (vm)$ exit
    
  2. Delete your Compute Engine instance:
    $ gcloud compute instances delete [YOUR-VM/TPU-NAME]
    
  3. Delete your Cloud TPU resource:
    $ gcloud compute tpus delete [YOUR-VM/TPU-NAME]
    
  4. Delete the VPC network that Google automatically created as part of the Cloud TPU setup.

    Go to the VPC Networking page on the Google Cloud Platform Console.

  5. Select the VPC network. The network entry starts with cp-to-tp-peering in the ID.

    At the top of the page, click Delete to delete the selected VPC network.

  6. When you've finished finished examining the data, use the gsutil command to delete any Cloud Storage buckets you created. Replace YOUR-BUCKET-NAME with the name of your Cloud Storage bucket:
    $ gsutil rm -r gs://[YOUR-BUCKET-NAME]
    

Cloud console

  1. Delete your VM.
    1. Go to Compute Engine > VM instances from the left-hand navigation
    2. bar.
    3. Select your VM instance from the list. Click the trash can icon at the top of the page.
  2. Delete your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar.
    2. Select your TPU resource from the list. Click the trash can icon at the top of the page.
Was this page helpful? Let us know how we did:

Send feedback about...