Stay organized with collections Save and categorize content based on your preferences.

Cloud TPU PyTorch/XLA user guide

Run ML Workloads With PyTorch/XLA

Before starting the procedures in this guide, set up a TPU VM and ssh into it as described in Cloud TPU VM users guide.

See PyTorch supported versions for a list of the TPU software versions available for Pytorch/XLA.

Basic setup

Set the XRT TPU device configuration:

   (vm)$ export XRT_TPU_CONFIG="localservice;0;localhost:51011"

For models that have sizeable, frequent allocations, tcmalloc improves performance compared to the C/C++ runtime function malloc. The default malloc used on TPU VM is tcmalloc. You can force the TPU VM software to use the standard malloc by unsetting the LD_PRELOAD environment variable:

   (vm)$ unset LD_PRELOAD

Changing PyTorch version

If you don't want to use the PyTorch version preinstalled on TPU VMs, install the version you want to use. For example if you want to use 1.10:

(tpuvm):$ cd /usr/share/
(tpuvm):$ sudo git clone -b release/1.10 --recursive https://github.com/pytorch/pytorch 
(tpuvm):$ cd pytorch/
(tpuvm):$ sudo git clone -b r1.10 --recursive https://github.com/pytorch/xla.git
(tpuvm):$ cd xla/
(tpuvm):$ yes | sudo pip3 uninstall torch_xla
(tpuvm):$ yes | sudo pip3 uninstall torch
(tpuvm):$ yes | sudo pip3 uninstall torch_vision
(tpuvm):$ sudo pip3 install torch==1.10.0
(tpuvm):$ sudo pip3 install torchvision==0.11.1
(tpuvm):$ sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-1.10-cp38-cp38-linux_x86_64.whl
(tpuvm):$ sudo mv /usr/lib/libtpu.so /tmp
(tpuvm):$ sudo /snap/bin/gsutil cp gs://tpu-pytorch/v4_wheel/110/libtpu.so /lib/libtpu.so

Perform a simple calculation

  1. Start the Python interpreter on the TPU VM:

    (vm)$ python3
    
  2. Import the following PyTorch packages:

    import torch
    import torch_xla.core.xla_model as xm
    
  3. Enter the following script:

    dev = xm.xla_device()
    t1 = torch.randn(3,3,device=dev)
    t2 = torch.randn(3,3,device=dev)
    print(t1 + t2)
    

    The following output is displayed:

    tensor([[-0.2121,  1.5589, -0.6951],
           [-0.7886, -0.2022,  0.9242],
           [ 0.8555, -1.8698,  1.4333]], device='xla:1')
    

Running Resnet on a single-device TPU

At this point, you can run any PyTorch / XLA code you please. For instance, you can run a ResNet model with fake data:

(vm)$ git clone --recursive https://github.com/pytorch/xla.git
(vm)$ python3 xla/test/test_train_mp_imagenet.py --fake_data --model=resnet50 --num_epochs=1

The ResNet sample trains for 1 epoch and takes about 7 minutes. It returns output similar to the following:

Epoch 1 test end 20:57:52, Accuracy=100.00 Max Accuracy: 100.00%

After the ResNet training ends, delete the TPU VM.

(vm)$ exit
$ gcloud compute tpus tpu-vm delete tpu-name \
--zone=zone

The deletion might take several minutes. Verify the resources have been deleted by running gcloud alpha compute tpus list --zone=${ZONE}.

Advanced setup

In the previous examples (the simple calculation and ResNet50), the PyTorch/XLA program starts the local XRT server in the same process as the Python interpreter. You can also choose to start the XRT local service in a separate process:

(vm)$ python3 -m torch_xla.core.xrt_run_server --port 51011 --restart

The advantage of this approach is that compilation cache persists across training runs. When running the XLA server in a separate process, server-side logging information is written to /tmp/xrt_server_log.

(vm)$ ls /tmp/xrt_server_log/
server_20210401-031010.log

TPU VM performance profiling

For more information about profiling your models on TPU VM, see PyTorch XLA performance profiling.

PyTorch/XLA TPU Pod examples

See PyTorch TPU VM Pod for setup information and examples for running PyTorch/XLA on a TPU VM Pod.

Docker on TPU VM

This section shows how to run Docker on TPU VM with PyTorch/XLA pre-installed.

Available Docker images

You can refer to the GitHub README to find all of the available TPU VM docker images.

Run docker images on TPU VM

(tpuvm): sudo docker pull gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm
(tpuvm): sudo docker run --privileged  --shm-size 16G --name tpuvm_docker -it -d  gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm
(tpuvm): sudo docker exec --privileged -it tpuvm_docker /bin/bash
(pytorch) root:/#

Verify libtpu

To verify libtpu is installed, run:

(pytorch) root:/# ls /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ | grep libtpu
This should generate output similar to the following:
libtpu
libtpu_nightly-0.1.dev20220518.dist-info

If no results are displayed, you can manually install the corresponding libtpu using:

(pytorch) root:/# pip install torch_xla[tpuvm]

Verify tcmalloc

tcmalloc is the default malloc we use on TPU VM. For more information read this section. This library should be pre installed on newer TPU VM Docker images, but it is always better to manually verify it. You can run the following command to verify the library is installed.

(pytorch) root:/# echo $LD_PRELOAD
This should generate output similar to:
/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

If LD_PRELOAD is not set, you can manually run:

(pytorch) root:/# sudo apt-get install -y google-perftools
(pytorch) root:/# export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"

Verify device

You can verify that the TPU VM device is available by running:

(pytorch) root:/# ls /dev | grep accel
This should generate the following results.
accel0
accel1
accel2
accel3

If no results are shown, most likely you did not start the container with the --privileged flag.

Run a model

You can verify if the TPU VM device is available by running:

(pytorch) root:/# export XRT_TPU_CONFIG="localservice;0;localhost:51011"
(pytorch) root:/# python3 pytorch/xla/test/test_train_mp_imagenet.py --fake_data --num_epochs 1