Add a Persistent Disk to a TPU VM

A TPU VM includes a 100GB boot disk. For some scenarios, your TPU VM might need additional storage for training or preprocessing. You can add a Persistent Disk to expand your local disk capacity.

Overview

A Persistent Disk attached to a single-device TPU (v2-8, v3-8, v4-8, etc.) can be configured as read-write or read-only. When you attach a Persistent Disk to a TPU VM that is part of a TPU Pod, the disk is attached to each TPU VM in that Pod. To prevent two or more TPU VMs from a Pod from writing to a Persistent Disk at once, all Persistent Disks attached to a TPU VM in a Pod must be configured as read-only. read-only disks are useful for storing a dataset for processing on a TPU Pod.

After creating and attaching a Persistent Disk to your TPU VM, you must mount the Persistent Disk, specifying where in the file system the Persistent Disk can be accessed. For more information, see Mounting a disk.

Prerequisites

You need to have a Google Cloud account and project set up before using the following procedures. If you don't already have a Cloud TPU project set up, follow the procedure in Set up an account and a Cloud TPU project before continuing.

High-level steps

The high-level steps to set up a Persistent Disk:

  1. Create a Persistent Disk
  2. Attach a Persistent Disk to a TPU VM
  3. Mount the Persistent Disk
  4. Clean up TPU VM and Persistent Disk resources

Setting up a TPU VM and a Persistent Disk

You can attach a Persistent Disk to a TPU VM when you create the TPU VM. You can also attach a Persistent Disk to an existing TPU VM.

Create a Persistent Disk

Use the following command to create a Persistent Disk:

  $ gcloud compute disks create disk-name \
    --size disk-size  \
    --zone zone \
    --type pd-balanced

Command flag descriptions

disk-name
A name of your choosing for the Persistent Disk.
disk-size
The size of the Persistent Disk in GB.
zone
The zone in which to create the Persistent Disk. This needs to be the same zone used to create the TPU.
type
The disk type to add. Supported types are: pd-standard, pd-ssd or pd-balanced.

Attach a Persistent Disk

You can attach a Persistent Disk to your TPU VM when you create the TPU VM or you can add one after the TPU VM is created.

Attach a Persistent Disk when you create a TPU VM

Use the --data-disk flag to attach a Persistent Disk when you create a TPU VM. If you are creating a TPU Pod, you must specify mode=read-only. If you are creating a single TPU device, you can specify mode=read-only or mode=read-write. The following command creates a single TPU and sets the Persistent Disk mode to read-write:

  $ gcloud compute tpus tpu-vm create tpu-name \
    --project project-id \
    --zone=zone \
    --accelerator-type=v3-8 \
    --version=Cloud TPU software version \
    --data-disk source=projects/project-id/zones/zone/disks/disk-name,mode=read-write

Command flag descriptions

tpu-name
The name you have chosen for the TPU resources.
project
Your project ID.
zone
The zone to create your Cloud TPU in.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
The Cloud TPU software version for your framework.
data-disk
The name and read/write mode of the Persistent Disk to attach to the TPU VM.

Attach a Persistent Disk to an existing TPU VM

Use the gcloud alpha compute tpus tpu-vm attach-disk command to attach a Persistent Disk to an existing TPU VM. See the gcloud documentation for more details and examples.

  $ gcloud alpha compute tpus tpu-vm attach-disk tpu-name \
    --zone=zone \
    --disk=disk-name \
    --mode=disk-mode

Command flag descriptions

tpu-name
The name of the TPU resources.
zone
The zone where the Cloud TPU is located.
disk-name
The name of the Persistent Disk to attach to the TPU VM.
mode
The mode of the disk. Mode must be one of: read-only or read-write.

If you want to delete the Persistent Disk when you delete the TPU VM, you need to set the auto-delete state of the Persistent Disk using the following command:

$ gcloud compute instances set-disk-auto-delete vm-instance \
  --zone=zone \
  --auto-delete \
  --disk=disk-name

Command flag descriptions

vm-instance
After you SSH into the TPU VM, your shell prompt changes to include your user ID followed by a generated VM instance name (for example. pjohnston@t1v-n-...$). Replace vm-instance with the generated VM instance name,
zone
The zone in which the Persistent Disk is located.
auto-delete
Automatically delete the Persistent Disk when the TPU resources are deleted.
disk-name
A name of your Persistent Disk.

If your VM shuts down for any reason, the Persistent Disk might be disconnected. See Configure automatic mounting on system restart to cause your Persistent Disk to automatically mount on VM restart.

For more information about automatically deleting a Persistent Disk, see Modify a Persistent Disk.

Mount a Persistent Disk

In order to access a Persistent Disk from a TPU VM, you must mount the disk. This specifies a location in the TPU VM file system where the Persistent Disk can be accessed.

  1. Connect to your TPU VM using SSH:

    $ gcloud compute tpus tpu-vm ssh tpu-name --zone zone
    

    When working with a TPU Pod, there is a one TPU VM for each TPU in the Pod. The preceding command will work for both TPU devices and TPU Pods. If you are using TPU Pods this command will connect you to the first TPU in the Pod (also called worker 0).

  2. From the TPU VM, list the disks attached to the TPU VM:

    (vm)$ sudo lsblk
    

    The output from the lsblk command should look like the following:

    NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    loop0     7:0    0  55.5M  1 loop /snap/core18/1997
    loop1     7:1    0  67.6M  1 loop /snap/lxd/20326
    loop2     7:2    0  32.3M  1 loop /snap/snapd/11588
    loop3     7:3    0  32.1M  1 loop /snap/snapd/11841
    loop4     7:4    0  55.4M  1 loop /snap/core18/2066
    sda       8:0    0   300G  0 disk
    ├─sda1    8:1    0 299.9G  0 part /
    ├─sda14   8:14   0     4M  0 part
    └─sda15   8:15   0   106M  0 part /boot/efi
    sdb       8:16   0    10G  0 disk    <== Persistent Disk
    

    In this example sda is the boot disk and sdb is the name of the newly attached Persistent Disk. The name of the attached Persistent Disk will depend upon how many persistent disks are attached to the VM.

    When using a TPU Pod, you will need to mount the Persistent Disk on all TPU VMs in your Pod. The name of the Persistent Disk should be the same for all TPU VMs, but it is not guaranteed. For example if you detach and then re-attach the Persistent Disk, the device name will be incremented, changing from sdb to sdc, and so on.

  3. If the disk has not been formatted, format the attached Persistent Disk now:

    (vm)$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
    
  4. Create a directory to mount the Persistent Disk:

    If you are using a TPU device, run the following command to create a directory to mount the Persistent Disk:

    (vm)$ sudo mkdir -p /mnt/disks/persist
    

    If you are using a TPU Pod, run the following command outside of your TPU VM. This will create the directory on all TPU VMs in the Pod.

    (vm)$ gcloud compute tpus tpu-vm ssh $TPU_NAME --worker=all --command="sudo mkdir -p /mnt/disks/persist"
    
  5. Mount the Persistent Disk:

    If you are using a TPU device, run the following command to mount the Persistent Disk on your TPU VM.

    (vm)$ sudo mount -o discard,defaults /dev/sdb /mnt/disks/persist

    If you are using a TPU Pod, run the following command outside of your TPU VM. It will mount the Persistent Disk on all TPU VMs in your Pod.

    (vm)$ gcloud compute tpus tpu-vm ssh $TPU_NAME --worker=all --command="sudo mount -o discard,defaults /dev/sdb /mnt/disks/persist"

Clean up

Delete your TPU resources when you are done with them.

  1. Disconnect from the Compute Engine instance, if you have not already done so:

    (vm)$ exit
    

    Your prompt should now be username@projectname, showing you are in the Cloud Shell.

  2. Delete your Cloud TPU and Compute Engine resources.

    $ gcloud compute tpus tpu-vm delete tpu-name \
     --zone=zone
    
  3. Verify the resources have been deleted by running gcloud list. The deletion might take several minutes. The output from gcloud list shouldn't display any of the TPU VM resources created by this procedure.

    TPU VM

    $ gcloud compute tpus tpu-vm list --zone=zone
    

    TPU Node

    $ gcloud compute tpus execution-groups list --zone zone
    
  4. Verify that the Persistent Disk was automatically deleted when the TPU VM was deleted by listing all disks in the zone where you created the Persistent Disk:

    $ gcloud compute disks list --filter="zone:( us-central1-b )"
    

    If the Persistent Disk was not deleted when the TPU VM was deleted, use the following commands to delete it:

    $ gcloud compute disks delete disk-name \
    --zone zone