Run a calculation on a Cloud TPU VM by using PyTorch
This quickstart shows you how to create a Cloud TPU, install PyTorch and run a simple calculation on a Cloud TPU. For a more in depth tutorial showing you how to train a model on a Cloud TPU see one of the Cloud TPU PyTorch Tutorials.
Before you begin
Before you follow this quickstart, you must create a Google Cloud Platform
account, install the Google Cloud CLI. and configure the gcloud
command.
For more information, see Set up an account and a Cloud TPU project.
Create a Cloud TPU with gcloud
Launch a Compute Engine VM and Cloud TPU using the gcloud
command. The command you use depends on whether you are using a TPU VM or a TPU
node. For more information on the two VM architecture, see
System Architecture. For more
information on the gcloud
command, see the
gcloud Reference.
TPU VM
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central1-b \
--accelerator-type=v3-8 \
--version=tpu-vm-pt-1.11
TPU Node
When creating a TPU Node for PyTorch, you first create a Compute Engine VM instance.
gcloud compute instances create tpu-name \ --zone=us-central1-b \ --machine-type=n1-standard-16 \ --image-family=torch-xla \ --image-project=ml-images \ --boot-disk-size=200GB \ --scopes=https://www.googleapis.com/auth/cloud-platform
Command flag descriptions
project
- Your GCP project ID
name
- The name of the Cloud TPU to create.
zone
- The zone where you plan to create your Cloud TPU.
disk-size
- The size of the hard disk in GB of the VM created by the
gcloud
command. machine-type
- The machine type of the Compute Engine VM to create.
tf-version
- The version of Tensorflow
gcloud
installs on the VM. accelerator-type
- The type of the Cloud TPU to create.
Next, create the TPU instance.
gcloud compute tpus create tpu-name \ --zone=us-central1-b \ --network=default \ --version=pytorch-1.11 \ --accelerator-type=v3-8
Connect to your Cloud TPU VM
TPU VM
$ gcloud compute tpus tpu-vm ssh tpu-name \
--zone us-central1-b
TPU Node
gcloud compute ssh tpu-name --zone=us-central1-b
Set XRT TPU device configuration
TPU VM
Configure the Torch-XLA environment.
(vm)$ export XRT_TPU_CONFIG="localservice;0;localhost:51011"
TPU Node
Find the IP address of the TPU Node.
(vm)$ gcloud compute tpus describe tpu-name --zone=us-central1-b
Configure the Torch-XLA environment. Make sure to replace your-tpu-ip-address with the IP address of your TPU.
(vm)$ conda activate torch-xla-1.11 (vm)$ export TPU_IP_ADDRESS=your-tpu-ip-address (vm)$ export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470"
Perform a simple calculation:
Create a file named
tpu-test.py
in the current directory and copy and paste the following script into it.import torch import torch_xla.core.xla_model as xm dev = xm.xla_device() t1 = torch.randn(3,3,device=dev) t2 = torch.randn(3,3,device=dev) print(t1 + t2)
Run the script:
(vm)$ python3 tpu-test.py
Output from the script shows the result of the computation:
tensor([[-0.2121, 1.5589, -0.6951], [-0.7886, -0.2022, 0.9242], [ 0.8555, -1.8698, 1.4333]], device='xla:1')
OpKernel ('op: "TPURoundRobin" device_type: "CPU"') for unknown op: TPURoundRobin OpKernel ('op: "TpuHandleToProtoKey" device_type: "CPU"') for unknown op: TpuHandleToProtoKey
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Disconnect from the Compute Engine instance, if you have not already done so:
(vm)$ exit
Your prompt should now be
username@projectname
, showing you are in the Cloud Shell.Delete your Cloud TPU.
TPU VM
$ gcloud compute tpus tpu-vm delete tpu-name \ --zone=us-central1-b
TPU Node
$ gcloud compute tpus execution-groups delete tpu-name \ --zone=us-central1-b
The output of this command should confirm that your TPU has been deleted.
What's next
Read more about Cloud TPU VMs: