Creating and deleting TPUs

Running a Machine Learning (ML) model requires a Compute Engine VM and Cloud TPU resources. This page describes how to manage these resources using:

  • The ctpu utility, which provides a CLI specifically designed for managing Cloud TPU resources
  • The gcloud command-line tool, which provides the primary CLI to Google Cloud Platform (GCP)
  • The GCP Console, which provides an integrated management console for your GCP resources.

Prerequisites

To run these procedures, you need to have a Google Cloud Platform (GCP) project set up. If you don't have a project, see Creating and managing projects to set one up.

Setting up a Compute Engine VM

ctpu

The ctpu utility can create the Compute Engine VM and Cloud TPU resources together or separately. In this procedure, ctpu is used to only create the Compute Engine VM.

  1. In the Cloud shell, run the following command to create a Compute Engine VM:
  2. $ ctpu up --vm-only --zone=your-zone [optional: --name --machine-type --disk-size-gb]
    

    Parameter Description
    vm-only Create only the Compute Engine VM.
    zone The zone where you plan to create your Cloud TPU. For example, us-central1-b.
    name Specify a name for the Compute Engine VM.
    machine-type This is the type of machine to use for the VM. See machine types for the supported machine types.
    disk-size-gb The disk size for the VM, specified as: VM[size in GB]. For example, 300GB.

See the ctpu Reference for all of the ctpu options.

gcloud commands

Use gcloud commands to interact with GCP in the Cloud shell.

  1. If you are not using the Cloud Shell as your command interface, set up the gcloud command-line tool by installing the Cloud SDK for your operating system.
  2. Configure gcloud to use your project.
  3. gcloud config set project project-name
    
  4. Specify the zone where you plan to create your Compute Engine VM. For example, us-central1-b.
  5. $ gcloud config set compute/zone your-zone
    
  6. Launch a Compute Engine virtual machine.

    Since you specified the zone in the previous command, the VM instance is created in that zone.

  7.    gcloud compute instances create vm-name \
       --machine-type=machine-type \
       --image-project=ml-images \
       --image-family=tf-1-14 \
       --boot-disk-size=boot-disk-size \
       --scopes=cloud-platform
       
    Parameter Description
    vm-name Specify a name for the Compute Engine VM.
    machine-type This is the type of machine to use for the VM. See machine types for the supported machine types.
    image-project The project against which all image and image family references will be resolved. Use ml-images.
    image-family This is the family of the image that the boot disk will be initialized with.
    boot-disk-size The boot disk size for the VM, specified as: VM[size in GB]. For example, 300GB.
    scopes Use cloud-platform.

    This will generate output similar to the following:

    NAME         ZONE           MACHINE_TYPE    PREEMPTIBLE INTERNAL_IP  EXTERNAL_IP    STATUS
    demo-vm-tpu  us-central1-b  n1-standard-1               10.138.0.2   35.247.15.162  RUNNING
    
  8. Remotely connect to your Compute Engine VM:
    $ gcloud compute ssh your-vm-name --zone=your-zone
    

Console

From the Google Cloud Platform Console, create your VM and establish remote access to it.

  1. Select Compute Engine > VM instances from the left-hand navigation bar and click CREATE INSTANCE.
  2. From the top menu bar on the Create an instance page, select Create an instance and specify an instance name, the region, and a machine type.
  3. Parameter Description
    name Specifies the name of the Compute Engine VM. You can specify any instance name, but use the same one for both the VM instance and the Cloud TPU.
    region This should match the Location setting you used when setting up your Cloud Storage bucket.
    machine type Specifies the machine type to use for your Compute Engine VM. Select a machine type from the drop down menu.
  4. Go to Compute Engine > VM instances. Find the instance with your VM name, and click SSH to connect to it.

Setting up a Cloud TPU

Set up your Compute Engine VM using the VM setup procedure before setting up your Cloud TPU. You can allocate and start your TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the following command in the Cloud shell to create your Cloud TPU.

$ ctpu up --tpu-only --name=tpu-name --zone=your-zone [optional: --tpu-size]
Parameter Description
name Specifies the name of the Cloud TPU. Use the same name as you used for the Compute Engine VM.
zone The zone where you plan to create your Cloud TPU. This should be the same zone you used for the Compute Engine VM. For example, us-central1-b.
tpu-size This is the TPU type to use. The default is v2-8. See Types and zones for the supported TPU types and zones.

gcloud commands

The Cloud SDK is a set of tools that you can use to interact with GCP in the Cloud shell.

  1. Install the gcloud command-line tool via the Cloud SDK.
  2. Use the gcloud command-line tool to specify your GCP project:
    $ gcloud config set project your-cloud-project
    
  3. Specify the zone where you plan to create your Cloud TPU resource. This should be the same zone you used for the Compute Engine VM. For example, us-central1-b.
    $ gcloud config set compute/zone your-zone
    

    For reference, Cloud TPU is available in the following zones:

    US

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB us-central1-b
    us-central1-c
    (us-central1-f TFRC only)
    v2-32 (Beta) 32 256 GiB us-central1-a
    v2-128 (Beta) 128 1 TiB us-central1-a
    v2-256 (Beta) 256 2 TiB us-central1-a
    v2-512 (Beta) 512 4 TiB us-central1-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB us-central1-a
    us-central1-b
    (us-central1-f TFRC only)

    Europe

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB europe-west4-a
    v2-32 (Beta) 32 256 GiB europe-west4-a
    v2-128 (Beta) 128 1 TiB europe-west4-a
    v2-256 (Beta) 256 2 TiB europe-west4-a
    v2-512 (Beta) 512 4 TiB europe-west4-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB europe-west4-a
    v3-32 (Beta) 32 512 GiB europe-west4-a
    v3-64 (Beta) 64 1 TiB europe-west4-a
    v3-128 (Beta) 128 2 TiB europe-west4-a
    v3-256 (Beta) 256 4 TiB europe-west4-a
    v3-512 (Beta) 512 8 TiB europe-west4-a
    v3-1024 (Beta) 1024 16 TiB europe-west4-a
    v3-2048 (Beta) 2048 32 TiB europe-west4-a

    Asia Pacific

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB asia-east1-c
  4. Create a new Cloud TPU resource.

    Since you specified the zone in the previous command, the Cloud TPU is created in that zone.

    $ gcloud compute tpus create your-tpu-name \
          --network=your-network-ID or default \
          --accelerator-type=your-tpu-version \
          --range=range \
          --version=1.14
    
    Parameter Description
    your-tpu-name Specifies the name of the Cloud TPU. Use the same name you used for the Compute Engine VM name.
    network-ID or default If you know your network ID, use that, otherwise enter default.
    accelerator-type This is your TPU type. See TPU types for the supported TPU types for your zone.
    range This is an internal IP address range for your TPU node, for example 192.168.0.0/29. See internal IP addresses to learn how to specify the range.
    version This is the current TensorFlow version to use with your Cloud TPU.

    This will generate output similar to the following:

    NAME         ZONE           ACCELERATOR_TYPE NETWORK_ENDPOINT  NETWORK  RANGE         STATUS
    demo-vm-tpu  us-central1-b  v2-8             10.240.1.2:8470   default  10.240.1.0/29 READY
    
  5. Remotely connect to your Compute Engine VM:
    $ gcloud compute ssh your-vm-name
    
  6. Create an environment variable containing the name of your TPU:
    $ export TPU_NAME=your-tpu-name
    

Console

    Create, start and connect to your Cloud TPU.

  1. Go to Compute Engine > TPUs on the left-hand navigation bar and click CREATE TPU NODE.
  2. On the Create a Cloud TPU page use the menu pulldowns to specify the TPU name, the zone, TPU type, TPU software version, network, and IP address range.
  3. Parameter Description
    name Specifies the name of the Cloud TPU. Use the same name you used for the Compute Engine VM name.
    zone The zone where you plan to create your Cloud TPU. For example, us-central1-b.
    TPU type This is your TPU type. See TPU types for the supported TPU types for your zone.
    TPU software version This is the current TensorFlow or PyTorch version to use with your Cloud TPU.
    network-ID or default If you know your network ID, use that, otherwise enter default.
    range This is an internal IP address range for your TPU node, for example 192.168.0.0/29. See internal IP addresses to learn how to specify the range.
  4. Go to Compute Engine > VM instances. Find the instance with your VM name, and click SSH to connect to it.
  5. From your VM, create an environment variable containing the name of your TPU:
    $ export TPU_NAME=your-tpu-name
    

Setting up a Compute Engine VM and Cloud TPU resources

You can allocate and start your VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the following command in the Cloud shell. The ctpu utility creates the Compute Engine VM and Cloud TPU resources together and gives them the same name.

$ ctpu up [optional: --name --zone --tpu-size --machine-type --disk-size-gb]
Parameter Description
name Specifies the name for both the Compute Engine VM and the Cloud TPU.
zone The zone where you plan to create your Cloud TPU. For example, us-central1-b.
tpu-size This is the TPU type to use. The default is v2-8. See Types and zones for the supported TPU types and zones.
machine type Specifies the machine type to use for your Compute Engine VM. Select a machine type from the drop down menu.
disk-size-gb The disk size for the VM, specified as: VM[size in GB]. For example, 300GB.

gcloud commands

The Cloud SDK is a set of tools that you can use to interact with GCP in the Cloud shell.

  1. Install the gcloud command-line tool via the Cloud SDK.
  2. Use the gcloud command-line tool to specify your GCP project:
    $ gcloud config set project your-cloud-project
    
  3. Specify the zone where you plan to create your Compute Engine VM and Cloud TPU resource. For this example, use the us-central1-b zone:
    $ gcloud config set compute/zone your-zone
    

    For reference, Cloud TPU is available in the following zones:

    US

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB us-central1-b
    us-central1-c
    (us-central1-f TFRC only)
    v2-32 (Beta) 32 256 GiB us-central1-a
    v2-128 (Beta) 128 1 TiB us-central1-a
    v2-256 (Beta) 256 2 TiB us-central1-a
    v2-512 (Beta) 512 4 TiB us-central1-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB us-central1-a
    us-central1-b
    (us-central1-f TFRC only)

    Europe

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB europe-west4-a
    v2-32 (Beta) 32 256 GiB europe-west4-a
    v2-128 (Beta) 128 1 TiB europe-west4-a
    v2-256 (Beta) 256 2 TiB europe-west4-a
    v2-512 (Beta) 512 4 TiB europe-west4-a
    TPU type (v3) TPU v3 cores Total TPU memory Available zones
    v3-8 8 128 GiB europe-west4-a
    v3-32 (Beta) 32 512 GiB europe-west4-a
    v3-64 (Beta) 64 1 TiB europe-west4-a
    v3-128 (Beta) 128 2 TiB europe-west4-a
    v3-256 (Beta) 256 4 TiB europe-west4-a
    v3-512 (Beta) 512 8 TiB europe-west4-a
    v3-1024 (Beta) 1024 16 TiB europe-west4-a
    v3-2048 (Beta) 2048 32 TiB europe-west4-a

    Asia Pacific

    TPU type (v2) TPU v2 cores Total TPU memory Available zones
    v2-8 8 64 GiB asia-east1-c
  4. Create a Compute Engine VM to interact with your Cloud TPU.

    Since you specified the zone in the previous command, the VM instance is created in that zone.

    $ gcloud compute instances create your-vm-and-tpu-name\
       --machine-type=n1-standard-1 \
       --image-project=ml-images \
       --image-family=tf-1-14 \
       --boot-disk-size=boot-disk-size \
       --scopes=cloud-platform
    
    Parameter Description
    name Specify a name for the Compute Engine VM.
    machine-type This is the type of machine to use for the VM. See machine types for the supported machine types.
    image-project The project against which all image and image family references will be resolved. Use ml-images.
    image-family This is the family of the image that the boot disk will be initialized with.
    boot-disk-size The boot disk size for the VM, specified as: VM[size in GB]. For example, 300GB.
    scopes Use cloud-platform.

    This will generate output similar to the following:

    NAME         ZONE           MACHINE_TYPE    PREEMPTIBLE INTERNAL_IP  EXTERNAL_IP    STATUS
    demo-vm-tpu  us-central1-b  n1-standard-1               10.138.0.2   35.247.15.162  RUNNING
    
  5. Create a new Cloud TPU resource.
    $ gcloud compute tpus create your-vm-and-tpu-name \
          --zone=your-zone \
          --network=your-network-id or default \
          --accelerator-type=your-tpu-version \
          --range=range \
          --version=1.14
    
    Parameter Description
    your-tpu-name Specifies the name of the Cloud TPU. Use the same name you used for the Compute Engine VM name.
    zone The zone where you plan to create your Cloud TPU. For example, us-central1-b.
    network-ID or default If you know your network ID, use that, otherwise enter default.
    accelerator-type This is your TPU type. See TPU types for the supported TPU types for your zone.
    range This is an internal IP address range for your TPU node, for example 192.168.0.0/29. See internal IP addresses to learn how to specify the range.
    version This is the current TensorFlow version to use with your Cloud TPU.

    This will generate output similar to the following:

    NAME         ZONE           ACCELERATOR_TYPE NETWORK_ENDPOINT  NETWORK  RANGE         STATUS
    demo-vm-tpu  us-central1-b  v2-8             10.240.1.2:8470   default  10.240.1.0/29 READY
    
  6. Remotely connect to your Compute Engine VM:
    $ gcloud compute ssh your-vm-and-tpu-name --zone=your-zone
    
  7. Create an environment variable containing the name of your TPU:
    $ export TPU_NAME=your-vm-and-tpu-name
    

Console

  1. Create and start your VM.
    1. Go to Compute Engine > VM instances on the left-hand navigation bar and click CREATE INSTANCE.
    2. On the Create an instance page specify an instance name, the region, and the machine type.
    3. Parameter Description
      name Specifies the name of the Compute Engine VM. You can specify any instance name, but use the same one for both the VM instance and the Cloud TPU.
      region This should match the Location setting you used when setting up your Cloud Storage bucket.
      machine type Specifies the machine type to use for your Compute Engine VM. Select a machine type from the drop down menu.
  2. Create, start and connect to your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar and click CREATE TPU NODE.
    2. On the Create a Cloud TPU page use the menu pulldowns to specify the TPU name, the zone, TPU type, TPU software version, network, and an internal IP address for the Cloud TPU to use.
    3. Parameter Description
      name Specifies the name of the Cloud TPU. Use the same name you used for the Compute Engine VM name.
      zone The zone where you plan to create your Cloud TPU. For example, us-central1-b.
      TPU type This is your TPU type. See TPU types for the supported TPU types for your zone.
      TPU software version This is the current TensorFlow or PyTorch version to use with your Cloud TPU.
      network-ID or default If you know your network ID, use that, otherwise enter default.
      range This is an internal IP address range for your TPU node, for example 192.168.0.0/29. See internal IP addresses to learn how to specify the range.
    4. Go to Compute Engine > VM instances. Find the instance with your VM name, and click SSH to connect to it.

Stopping your Cloud TPU resources

Charges for Cloud TPU resources begin when the Cloud TPU starts, even if it is not yet actively training a model. To avoid being charged while the Cloud TPU is inactive, you can stop it and restart it when you are ready to train a model.

This section shows how to stop the Cloud TPU using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

  1. Run the ctpu status command, specifying the zone where your Cloud TPU is set up.

    $ ctpu status --zone=your-zone

    This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

    Your cluster is running!
    Compute Engine VM:  RUNNING
    Cloud TPU:          RUNNING
    
  2. If the Cloud TPU resource is running, use the following command to stop it.

    $ ctpu pause --zone=your-zone

    This stops the Compute Engine VM and deletes the Cloud TPU resources in the specified zone. To only restart the Compute Engine VM, run the following command, including any other flags you need for the VM:

    $ ctpu up --vm-only --zone=your-zone 

    To restart the Cloud TPU, run ctpu up with the flags you set when you first started the Cloud TPU.

gcloud

  1. Run the following command in the Cloud shell to list the available Cloud TPU resources in your zone.

    $ gcloud compute tpus list --zone=your-zone
    

    The Cloud TPU resource in your zone is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    
  2. Run the following command to stop the Cloud TPU:

    $ gcloud compute tpus stop your-tpu-name --zone=your-zone
    

Console

    Select Compute Engine > TPUs from the left-hand navigation bar. Click STOP from the menu bar at the top of the page.

    To restart the Cloud TPU, click START.

Viewing your Compute Engine VM and Cloud TPU resources

This section shows how you can view your currently active VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the ctpu status command and specify the zone where your Compute Engine VM and Cloud TPU resources are set up.

$ ctpu status --zone=your-zone 

This will display the status of Compute Engine VM and Cloud TPU resources within the zone.

Your cluster is running!
Compute Engine VM:  RUNNING
Cloud TPU:          RUNNING

If no resources are currently set up, the output will just show dashes for the VM and TPU. If one resource is active and the other is not, you will see a message saying the status is unhealthy. You need to start or restart whichever resource is not running.

gcloud

  1. Run the following command in the Cloud shell to list the available Compute Engine VM resources in specific zones. In this example, VM resources in us-central1-b and europe-west4-a will be displayed:

    $ gcloud compute instances list --filter="zone:( us-central1-b europe-west4-a )"
    

    The above command prints the details of the VM resources you've created. For example:

    NAME      ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
    demo-tpu  us-central1-b  n1-standard-1               10.128.0.33  35.232.214.205  RUNNING
    
    
  2. Run the following command from the Cloud shell to list the available Cloud TPU resources in your zone. In this example we show the selected zone to be us-central1-b.

    $ gcloud compute tpus list --zone=us-central1-b
    

    The Cloud TPU resource in us-central1-b is displayed:

    NAME       ZONE           ACCELERATOR_TYPE  NETWORK_ENDPOINT   NETWORK  RANGE          STATUS
    demo-tpu   us-central1-b  v2-8              10.240.1.2:8470    default  10.240.1.0/29  READY
    

Console

  1. From the left navigation menu, select Compute Engine > TPUs.

    A list of all active Compute Engine resources appears.

  2. From the left navigation menu, select Compute Engine > VM Instances.

    A list of all active Compute Engine resources appears.

Deleting your Compute Engine VM and Cloud TPU resources

You can delete your VM and TPU resources using the ctpu utility, gcloud commands, or the GCP Console.

ctpu

Run the following command from the Cloud shell. The ctpu utility deletes the Compute Engine VM and Cloud TPU resources together.

$ ctpu delete [optional: --zone]

gcloud

Run the following command from the Cloud shell to delete your Cloud TPU and Compute Engine VM resources. You must include the Cloud TPU resource name, the zone, and the project ID on the command line. Delete the Cloud TPU first since you need a running Compute Engine VM to delete the Cloud TPU resource.

  1. Delete your Cloud TPU resource:
    $ gcloud compute tpus delete your-vm-and-tpu-name --project=your-cloud-project --zone=your-zone
    
  2. Delete your Compute Engine instance:
    $ gcloud compute instances delete your-vm-and-tpu-name --project=your-cloud-project --zone=your-zone
    
  3. Delete the VPC network that Google automatically created as part of the Cloud TPU setup.

    Go to the VPC Networking page on the GCP Console.

  4. Select the VPC network. The network entry starts with cp-to-tp-peering in the ID.

    At the top of the page, click Delete to delete the selected VPC network.

  5. When you've finished deleting the resources, use the gsutil command to delete any Cloud Storage buckets you created. Replace your-bucket-name with the name of your Cloud Storage bucket:
    $ gsutil rm -r gs://your-bucket-name
    

Console

  1. Delete your VM.
    1. Go to Compute Engine > VM instances from the left-hand navigation
    2. bar.
    3. Select your VM instance from the list. Click the trash can icon at the top of the page.
  2. Delete your Cloud TPU.
    1. Go to Compute Engine > TPUs on the left-hand navigation bar.
    2. Select your TPU resource from the list. Click the trash can icon at the top of the page.
Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Trenger du hjelp? Gå til brukerstøttesiden vår.