Run a calculation on a Cloud TPU VM using TensorFlow

This quickstart shows you how to create a Cloud TPU, install TensorFlow and run a simple calculation on a Cloud TPU. For a more in depth tutorial showing you how to train a model on a Cloud TPU see one of the Cloud TPU Tutorials.

Before you begin

Before you follow this quickstart, you must create a Google Cloud Platform account, install the Google Cloud CLI. and configure the gcloud command. For more information, see Set up an account and a Cloud TPU project.

Create a Cloud TPU VM or Node with gcloud

Launch a Compute Engine Cloud TPU using the gcloud command. The command you use depends on whether you are using a TPU VM or a TPU node. For more information on the two VM architecture, see System Architecture. For more information on the gcloud command, see the gcloud reference.

TPU VM

$ gcloud compute tpus tpu-vm create tpu-name \
--zone=europe-west4-a \
--accelerator-type=v3-8 \
--version=tpu-vm-tf-2.16.1-pjrt

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
The Cloud TPU software version.

TPU Node

$ gcloud compute tpus execution-groups create \
--name=tpu-name \
--zone=europe-west4-a \
--disk-size=300 \
--machine-type=n1-standard-16 \
--tf-version=2.12.0 \

Command flag descriptions

project
Your Google Cloud project ID
name
The name of the Cloud TPU to create.
zone
The zone where you plan to create your Cloud TPU.
disk-size
The size of the hard disk in GB of the VM created by the gcloud command.
machine-type
The machine type of the Compute Engine VM to create.
tf-version
The version of TensorFlow gcloud installs on the VM. See
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.

Connect to your Cloud TPU VM

When using TPU VMs, you must explicitly connect to your TPU VM using SSH. When using TPU Nodes, you should be automatically SSHed into your Compute EngineVM. If you are not automatically connected, use the following command.

TPU VM

$ gcloud compute tpus tpu-vm ssh tpu-name \
  --zone europe-west4-a

TPU Node

$ gcloud compute ssh tpu-name \
    --zone=europe-west4-a

Run a simple example using TensorFlow

TPU VM

Once you are connected to the TPU VM, set the following environment variable.

  (vm)$ export TPU_NAME=local

When creating your TPU, if you set the --version parameter to a version ending with -pjrt, set the following environment variables to enable the PJRT runtime:

  (vm)$ export NEXT_PLUGGABLE_DEVICE_USE_C_API=true
  (vm)$ export TF_PLUGGABLE_DEVICE_LIBRARY_PATH=/lib/libtpu.so

Create a file named tpu-test.pyin the current directory and copy and paste the following script into it.

  import tensorflow as tf
  print("Tensorflow version " + tf.__version__)

  @tf.function
  def add_fn(x,y):
    z = x + y
    return z

  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)

  x = tf.constant(1.)
  y = tf.constant(1.)
  z = strategy.run(add_fn, args=(x,y))
  print(z)

TPU Node

Create a file named tpu-test.pyin the current directory and copy and paste the following script into it.

import tensorflow as tf
print("Tensorflow version " + tf.__version__)

tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='your-tpu-name')  # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu)

@tf.function
def add_fn(x,y):
    z = x + y
    return z

x = tf.constant(1.)
y = tf.constant(1.)
z = strategy.run(add_fn, args=(x,y))
print(z)

Run this script with the following command:

(vm)$ python3 tpu-test.py

This script performs a simple computation on a each TensorCore of a TPU. The output will look similar to the following:

PerReplica:{
  0: tf.Tensor(2.0, shape=(), dtype=float32),
  1: tf.Tensor(2.0, shape=(), dtype=float32),
  2: tf.Tensor(2.0, shape=(), dtype=float32),
  3: tf.Tensor(2.0, shape=(), dtype=float32),
  4: tf.Tensor(2.0, shape=(), dtype=float32),
  5: tf.Tensor(2.0, shape=(), dtype=float32),
  6: tf.Tensor(2.0, shape=(), dtype=float32),
  7: tf.Tensor(2.0, shape=(), dtype=float32)
}

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. Disconnect from the Compute Engine instance, if you have not already done so:

    (vm)$ exit
    

    Your prompt should now be username@projectname, showing you are in the Cloud Shell.

  2. Delete your Cloud TPU.

    TPU VM

    $ gcloud compute tpus tpu-vm delete tpu-name \
    --zone=europe-west4-a
    

    TPU Node

    $ gcloud compute tpus execution-groups delete tpu-name \
    --zone=europe-west4-a
    
  3. Verify the resources have been deleted by running gcloud compute tpus tpu-vm list. The deletion might take several minutes.

    TPU VM

    $ gcloud compute tpus tpu-vm list --zone=europe-west4-a
    

    TPU Node

    $ gcloud compute tpus execution-groups list --zone=europe-west4-a
    

What's next

For more information about Cloud TPU, see: