Run a calculation on a Cloud TPU VM using PyTorch
This quickstart shows you how to create a Cloud TPU, install PyTorch and run a simple calculation on a Cloud TPU. For a more in depth tutorial showing you how to train a model on a Cloud TPU see one of the Cloud TPU PyTorch Tutorials.
Before you begin
Before you follow this quickstart, you must create a Google Cloud Platform
account, install the Google Cloud CLI. and configure the gcloud
command.
For more information, see
Set up an account and a Cloud TPU project.
Create a Cloud TPU with gcloud
Launch a Compute Engine VM and Cloud TPU using the gcloud
command. The command you use depends on whether you are using a TPU VM or a TPU
node. For more information on the two VM architecture, see
System Architecture. For more
information on the gcloud
command, see the
gcloud Reference.
TPU VM
To create a TPU VM in the default user project, network and compute/zone run:
$ gcloud compute tpus tpu-vm create tpu-name \
--zone=us-central2-b \
--accelerator-type=v3-8 \
--version=tpu-vm-pt-2.0
Command flag descriptions
While creating your TPU, you can pass the additional --network
and
--subnetwork
flags if
you want to specify the default network and subnetwork.
If you do not want to use the default network, you must pass the
--network
flag. The --subnetwork
flag is optional and can be used to
specify a default subnetwork for whatever network you are using (default or
user-specified). See the gcloud
API reference page for details on these flags.
TPU Node
When creating a TPU Node for PyTorch, you first create a Compute Engine VM instance.
gcloud compute instances create tpu-name \ --zone=us-central2-b \ --machine-type=n1-standard-16 \ --image-family=torch-xla \ --image-project=ml-images \ --boot-disk-size=200GB \ --scopes=https://www.googleapis.com/auth/cloud-platform
Command flag descriptions
project
- Your Cloud project ID
name
- The name of the Cloud TPU to create.
zone
- The zone where you plan to create your Cloud TPU.
disk-size
- The size of the hard disk in GB of the VM created by the
gcloud
command. machine-type
- The machine type of the Compute Engine VM to create.
tf-version
- The version of Tensorflow
gcloud
installs on the VM. accelerator-type
- The type of the Cloud TPU to create.
Next, create the TPU instance.
gcloud compute tpus create tpu-name \ --zone=us-central2-b \ --network=default \ --version=pytorch-1.11 \ --accelerator-type=v3-8
Connect to your Cloud TPU VM
TPU VM
$ gcloud compute tpus tpu-vm ssh tpu-name \
--zone=us-central2-b
TPU Node
gcloud compute ssh tpu-name --zone=us-central2-b
Set TPU runtime configuration
TPU VM
Configure the Torch-XLA environment.
There are two PyTorch/XLA runtime options: PJRT and XRT. We recommend you use PJRT unless you have a reason to use XRT. To learn more about the different runtime configurations for PyTorch/XLA, see the PJRT runtime documentation.
PJRT
(vm) $ export PJRT_DEVICE=TPU
XRT (Legacy)
(vm) $ export XRT_TPU_CONFIG="localservice;0;localhost:51011"
TPU Node
Find the IP address of the TPU Node.
(vm)$ gcloud compute tpus describe \ tpu-name \ --zone=us-central2-b
Configure the Torch-XLA environment. Make sure to replace your-tpu-ip-address with the IP address of your TPU.
(vm)$ conda activate torch-xla-1.11 (vm)$ export TPU_IP_ADDRESS=your-tpu-ip-address (vm)$ export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470"
Perform a simple calculation:
Create a file named
tpu-test.py
in the current directory and copy and paste the following script into it.import torch import torch_xla.core.xla_model as xm dev = xm.xla_device() t1 = torch.randn(3,3,device=dev) t2 = torch.randn(3,3,device=dev) print(t1 + t2)
Run the script:
(vm)$ python3 tpu-test.py
Output from the script shows the result of the computation:
tensor([[-0.2121, 1.5589, -0.6951], [-0.7886, -0.2022, 0.9242], [ 0.8555, -1.8698, 1.4333]], device='xla:1')
OpKernel ('op: "TPURoundRobin" device_type: "CPU"') for unknown op: TPURoundRobin OpKernel ('op: "TpuHandleToProtoKey" device_type: "CPU"') for unknown op: TpuHandleToProtoKey
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Disconnect from the Compute Engine instance, if you have not already done so:
(vm)$ exit
Your prompt should now be
username@projectname
, showing you are in the Cloud Shell.Delete your Cloud TPU.
TPU VM
$ gcloud compute tpus tpu-vm delete tpu-name \ --zone=us-central2-b
TPU Node
$ gcloud compute tpus execution-groups delete tpu-name \ --zone=us-central2-b
The output of this command should confirm that your TPU has been deleted.
What's next
Read more about Cloud TPU VMs: