Adding a persistent disk to a TPU VM

This document describes how to set up and attach a persistent disk to a TPU VM.

Overview

A TPU VM includes a 100GB boot disk. For some datasets, you might need more local storage for training or preprocessing. To train these models, you can add a persistent disk to expand your local disk capacity.

Prerequisites

You need to have a GCP account and project set up before using the following procedures. If you do not already have a Cloud TPU project set up, follow the procedure in Set up an account and a Cloud TPU project before continuing.

High-level steps

The high-level steps to set up a persistent disk with a TPU VM are:

  1. Create a persistent disk
  2. Launch a TPU VM with a persistent disk
  3. SSH into the TPU VM
  4. List the attached disks
  5. Format the attached persistent disk
  6. Create a directory to mount the persistent disk
  7. Mount the persistent disk
  8. Set permissions for the persistent disk
  9. Clean up TPU VM and persistent disk resources

Setting up a TPU VM and a persistent disk

  1. In a Cloud Shell, create a persistent disk:

    $ gcloud compute disks create disk-name \
    --size disk-size  \
    --zone zone \
    --type pd-balanced
    

    Command flag descriptions

    disk-name
    A name of your choosing for the persistent disk.
    disk-size
    The size of the persistent disk in GB.
    zone
    The zone in which to create the persistent disk. This needs to be the same zone used to create the TPU.
    type
    The disk type to add. Supported types are: 'pd-standard', 'pd-ssd' or 'pd-balanced'.
  2. Launch a TPU VM with the persistent disk attached:

    $ gcloud alpha compute tpus tpu-vm create tpu-name \
    --project project-id \
    --zone=zone \
    --accelerator-type=v3-8 \
    --version=v2-alpha \
    --data-disk source=projects/project-id/zones/zone/disks/disk-name,mode=read-write
    

    Command flag descriptions

    tpu-name
    The name you have chosen for the TPU resources.
    project
    Your project ID.
    zone
    The zone where you plan to create your Cloud TPU.
    accelerator-type
    The type of the Cloud TPU to create.
    version
    The Cloud TPU runtime version.
    data-disk
    The name and read/write mode of the persistent disk to attach to the TPU VM.
  3. SSH into the TPU VM

    $ gcloud alpha compute tpus tpu-vm ssh tpu-name --zone zone
    
  4. From the TPU VM, list the disks attached to the TPU VM:

    (vm)$ sudo lsblk
    
    NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    loop0     7:0    0  55.5M  1 loop /snap/core18/1997
    loop1     7:1    0  67.6M  1 loop /snap/lxd/20326
    loop2     7:2    0  32.3M  1 loop /snap/snapd/11588
    loop3     7:3    0  32.1M  1 loop /snap/snapd/11841
    loop4     7:4    0  55.4M  1 loop /snap/core18/2066
    sda       8:0    0   300G  0 disk
    ├─sda1    8:1    0 299.9G  0 part /
    ├─sda14   8:14   0     4M  0 part
    └─sda15   8:15   0   106M  0 part /boot/efi
    sdb       8:16   0    10G  0 disk    # persistent disk
    

    sda is the boot disk for the VM. The name of the attached persistent disk will depend upon how many persistent disks are attached to the VM.

  5. Format the attached persistent disk:

    (vm)$ sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
    
  6. Create a directory to mount the persistent disk:

    (vm)$ sudo mkdir -p /mnt/disks/persist
    
  7. Mount the persistent disk:

    (vm)$ sudo mount -o discard,defaults /dev/sdb /mnt/disks/persist
  8. Set permissions for the persistent disk:

    (vm)$ sudo chmod a+w /mnt/disks/persist
    
  9. If you want to delete the persistent disk when you delete the TPU VM, you need to set the auto-delete state of the persistent disk using the following command:

    $ gcloud alpha compute instances set-disk-auto-delete vm-instance \
     --zone=zone \
     --auto-delete \
     --disk=disk-name
    

    Command flag descriptions

    vm-instance
    After you SSH into the TPU VM, your shell prompt changes to include your user ID followed by a generated VM instance name (for example. pjohnston@t1v-n-...$). Replace vm-instance with the generated VM instance name,
    zone
    The zone in which the persistent disk is located.
    auto-delete
    Automatically delete the persistent disk when the TPU resources are deleted.
    disk-name
    A name of your persistent disk.

    If you do not want to have the persistent disk deleted automatically, skip this command. At any time, you can use the command shown in Cleanup to manually remove the persistent disk.

If your VM shuts down for any reason, the persistent disk might be disconnected. See Configure automatic mounting on system restart to cause your persistent disk to automatically mount on VM restart. Refer to the persistent disk document for details on managing persistent disks.

Clean up

  1. Disconnect from the Compute Engine instance, if you have not already done so:

    (vm)$ exit
    

    Your prompt should now be username@projectname, showing you are in the Cloud Shell.

  2. Delete your Cloud TPU and Compute Engine resources.

    $ gcloud alpha compute tpus tpu-vm delete tpu-name \
     --zone=zone
    
  3. Verify the resources have been deleted by running gcloud list. The deletion might take several minutes. The output from gcloud list should not display any of the TPU VM resources created by this procedure.

    TPU VM

    $ gcloud alpha compute tpus tpu-vm list --zone=zone
    

    TPU Node

    $ gcloud compute tpus execution-groups list --zone zone
    
  4. Verify that the persistent disk was automatically deleted when the TPU VM was deleted by listing all disks in the zone where you created the persistent disk:

    $ gcloud compute disks list --filter="zone:( us-central1-b )"
    

    If the persistent disk was not deleted when the TPU VM was deleted, use the following commands to delete it:

    $ gcloud compute disks delete disk-name \
    --zone zone