Cloud TPU PyTorch/XLA user guide
Run ML Workloads With PyTorch/XLA
Before starting the procedures in this guide, set up a TPU VM
and ssh
into it as described in Cloud TPU VM users guide.
See PyTorch supported versions for a list of the TPU software versions available for Pytorch/XLA.
Basic setup
Set the XRT TPU device configuration:
(vm)$ export XRT_TPU_CONFIG="localservice;0;localhost:51011"
For models that have sizeable, frequent allocations, tcmalloc
improves performance compared to the C/C++ runtime function malloc
.
The default malloc
used on TPU VM is tcmalloc
.
You can force the TPU VM software to use the standard malloc
by
unsetting the LD_PRELOAD
environment variable:
(vm)$ unset LD_PRELOAD
Changing PyTorch version
If you don't want to use the PyTorch version preinstalled on TPU VMs, install the version you want to use. For example if you want to use 1.13:
(tpuvm):$ cd /usr/share/ (tpuvm):$ sudo git clone -b release/1.13 --recursive https://github.com/pytorch/pytorch (tpuvm):$ cd pytorch/ (tpuvm):$ sudo git clone -b r1.13 --recursive https://github.com/pytorch/xla.git (tpuvm):$ cd xla/ (tpuvm):$ yes | sudo pip3 uninstall torch_xla (tpuvm):$ yes | sudo pip3 uninstall torch (tpuvm):$ yes | sudo pip3 uninstall torch_vision (tpuvm):$ sudo pip3 install torch==1.13.0 (tpuvm):$ sudo pip3 install torchvision==0.14.0 (tpuvm):$ sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-1.13-cp38-cp38-linux_x86_64.whl (tpuvm):$ sudo rm -rf /usr/local/lib/python3.8/dist-packages/libtpu* (tpuvm):$ sudo pip3 install torch_xla[tpuvm]
Perform a simple calculation
Start the Python interpreter on the TPU VM:
(vm)$ python3
Import the following PyTorch packages:
import torch import torch_xla.core.xla_model as xm
Enter the following script:
dev = xm.xla_device() t1 = torch.randn(3,3,device=dev) t2 = torch.randn(3,3,device=dev) print(t1 + t2)
The following output is displayed:
tensor([[-0.2121, 1.5589, -0.6951], [-0.7886, -0.2022, 0.9242], [ 0.8555, -1.8698, 1.4333]], device='xla:1')
Running Resnet on a single-device TPU
At this point, you can run any PyTorch / XLA code you please. For instance, you can run a ResNet model with fake data:
(vm)$ git clone --recursive https://github.com/pytorch/xla.git (vm)$ python3 xla/test/test_train_mp_imagenet.py --fake_data --model=resnet50 --num_epochs=1
The ResNet sample trains for 1 epoch and takes about 7 minutes. It returns output similar to the following:
Epoch 1 test end 20:57:52, Accuracy=100.00 Max Accuracy: 100.00%
After the ResNet training ends, delete the TPU VM.
(vm)$ exit
$ gcloud compute tpus tpu-vm delete tpu-name \
--zone=zone
The deletion might take several minutes. Verify the resources have been deleted
by running gcloud compute tpus list --zone=${ZONE}
.
Advanced setup
In the previous examples (the simple calculation and ResNet50), the PyTorch/XLA program starts the local XRT server in the same process as the Python interpreter. You can also choose to start the XRT local service in a separate process:
(vm)$ python3 -m torch_xla.core.xrt_run_server --port 51011 --restart
The advantage of this approach is that compilation cache persists across
training runs. When running the XLA server in a separate process, server-side
logging information is written to /tmp/xrt_server_log
.
(vm)$ ls /tmp/xrt_server_log/
server_20210401-031010.log
TPU VM performance profiling
For more information about profiling your models on TPU VM, see PyTorch XLA performance profiling.
PyTorch/XLA TPU Pod examples
See PyTorch TPU VM Pod for setup information and examples for running PyTorch/XLA on a TPU VM Pod.
Docker on TPU VM
This section shows how to run Docker on TPU VM with PyTorch/XLA pre-installed.
Available Docker images
You can refer to the GitHub README to find all of the available TPU VM Docker images.
Run Docker images on TPU VM
(tpuvm): sudo docker pull gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm (tpuvm): sudo docker run --privileged --shm-size 16G --name tpuvm_docker -it -d gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm (tpuvm): sudo docker exec --privileged -it tpuvm_docker /bin/bash (pytorch) root:/#
Verify libtpu
To verify libtpu is installed, run:
(pytorch) root:/# ls /root/anaconda3/envs/pytorch/lib/python3.8/site-packages/ | grep libtpu
This should generate output similar to the following:
libtpu libtpu_nightly-0.1.dev20220518.dist-info
If no results are displayed, you can manually install the corresponding libtpu using:
(pytorch) root:/# pip install torch_xla[tpuvm]
Verify tcmalloc
tcmalloc
is the default malloc we use on TPU VM. For more information read
this section. This library should be pre installed on newer
TPU VM Docker images, but it is always better to manually verify it. You can run
the following command to verify the library is installed.
(pytorch) root:/# echo $LD_PRELOAD
This should generate output similar to:
/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
If LD_PRELOAD
is not set, you can manually run:
(pytorch) root:/# sudo apt-get install -y google-perftools (pytorch) root:/# export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"
Verify device
You can verify that the TPU VM device is available by running:
(pytorch) root:/# ls /dev | grep accel
This should generate the following results.
accel0 accel1 accel2 accel3
If no results are shown, most likely you did not start the container with the
--privileged
flag.
Run a model
You can verify if the TPU VM device is available by running:
(pytorch) root:/# export XRT_TPU_CONFIG="localservice;0;localhost:51011" (pytorch) root:/# python3 pytorch/xla/test/test_train_mp_imagenet.py --fake_data --num_epochs 1