Running the Transformer with Tensor2Tensor

This tutorial shows you how to train the Transformer model (from Attention Is All You Need) with Tensor2Tensor on a Cloud TPU.

Model description

The Transformer model uses stacks of self-attention layers and feed-forward layers to process sequential input like text. It supports the following variants:

  • transformer (encoder-decoder) for sequence to sequence modeling. Example use case: translation.
  • transformer (decoder-only) for single sequence modeling. Example use case: language modeling.
  • transformer_encoder (encoder-only) runs only the encoder for sequence to class modeling. Example use case: sentiment classification.

The Transformer is just one of the models in the Tensor2Tensor library. Tensor2Tensor (T2T) is a library of deep learning models and datasets as well as a set of scripts that allow you to train the models and to download and prepare the data.

Before you begin

Before starting this tutorial, follow the steps below to check that your Google Cloud Platform project is correctly set up.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. This walkthrough uses billable components of Google Cloud Platform. Check the Cloud TPU pricing page to estimate your costs, and follow the instructions to clean up resources when you've finished with them.

Create a Cloud Storage bucket

You need a Cloud Storage bucket to store the data that you use to train your machine learning model and the results of the training.

  1. Go to the Cloud Storage page on the GCP Console.

    Go to the Cloud Storage page

  2. Create a new bucket, specifying the following options:

    • A unique name of your choosing.
    • Default storage class: Regional
    • Location: us-central1

Open Cloud Shell and use the ctpu tool

This guide uses the Cloud TPU Provisioning Utility (ctpu) as a simple tool for setting up and managing your Cloud TPU. The guide runs ctpu from a Cloud Shell. For more advanced setup options, see the custom setup.

The ctpu tool is pre-installed in your Cloud Shell. Follow these steps to check your ctpu configuration:

  1. Open a Cloud Shell window.

    Open Cloud Shell

  2. Type the following into your Cloud Shell, to check your ctpu configuration:

    $ ctpu print-config

    You should see a message like this:

    2018/04/29 05:23:03 WARNING: Setting zone to "us-central1-b"
    ctpu configuration:
            name: [your TPU's name]
            project: [your-project-name]
            zone: us-central1-b
    If you would like to change the configuration for a single command invocation, please use the command line flags.

  3. Take a look at the ctpu commands:

    $ ctpu

    You should see a usage guide, including a list of subcommands and flags with a brief description of each one.

Create a Compute Engine VM and a Cloud TPU

Run the following command to set up a Compute Engine virtual machine (VM) and a Cloud TPU with associated services. This combination of resources and services is called a Cloud TPU flock:

$ ctpu up [optional: --name --zone]

You should see a message like this:

ctpu will use the following configuration: 
   Name: [your TPU's name]
   Zone: [your project's zone]
   GCP Project: [your project's name]
   TensorFlow Version: 1.9
     Machine Type: [your machine type]
     Disk Size: [your disk size]
     Preemptible: [true or false]
   Cloud TPU:
     Size: [your TPU size]
     Preemptible: [true or false]
OK to create your Cloud TPU resources with the above configuration? [Yn]:

Press y to create your Cloud TPU resources.

The ctpu up command performs the following tasks:

  • Enables the Compute Engine and Cloud TPU services.
  • Creates a Compute Engine VM with the latest stable TensorFlow version pre-installed. The default zone is us-central1-b. For reference, Cloud TPU is available in the following zones:

    • United States (US)
    • Europe (EU)
      • europe-west4-a
    • Asia Pacific (APAC)
      • asia-east1-c

  • Creates a Cloud TPU with the corresponding version of TensorFlow, and passes the name of the Cloud TPU to the Compute Engine VM as an environment variable (TPU_NAME).

  • Ensures your Cloud TPU has access to resources it needs from your GCP project, by granting specific IAM roles to your Cloud TPU service account.
  • Performs a number of other checks.
  • Logs you in to your new Compute Engine VM.

You can run ctpu up as often as you like. For example, if you lose the SSH connection to the Compute Engine VM, run ctpu up to restore the connection, specifying --name and --zone if you changed the default values. See the ctpu documentation for details.

From this point on, a prefix of (vm)$ means you should run the command on the Compute Engine VM instance.

Verify your Compute Engine VM

When the ctpu up command has finished executing, verify that your shell prompt has changed from username@project to username@tpuname. This change shows that you are now logged into your Compute Engine VM.

Use the default or change the Cloud Storage access permissions

The ctpu up command set up default permissions for your Cloud TPU service account. If you want finer-grain permissions, review and update the access level permissions.

Add disk space to your VM

T2T conveniently packages data generation for many common open-source datasets in its t2t-datagen script. The script downloads the data, preprocess it, and makes it ready for training. To do so, it needs local disk space.

You can skip this step if you run t2t-datagen on your local machine (pip install tensor2tensor and then see the t2t-datagen command below).

  • Follow the Compute Engine guide to add a disk to your Compute Engine VM.
  • Set the disk size to 200GB (the recommended minimum size).
  • Set When deleting instance to Delete disk to ensure that the disk is removed when you remove the VM.

Make a note of the path to your new disk. For example: /mnt/disks/mnt-dir.

Generate the training dataset

On your Compute Engine VM:

  1. Create the following environment variables:

    (vm)$ DATA_DIR=$STORAGE_BUCKET/data/
    (vm)$ TMP_DIR=/mnt/disks/mnt-dir/t2t_tmp


    • YOUR-BUCKET-NAME is the name of your Cloud Storage bucket.
    • DATA_DIR is a location on Cloud Storage.
    • TMP_DIR is a location on the disk that you added to your Compute Engine VM at the start of the tutorial.
  2. Create a temporary directory on the disk that you added to your Compute Engine VM at the start of the tutorial:

    (vm)$ mkdir /mnt/disks/mnt-dir/t2t_tmp

  3. Use the t2t-datagen script to generate the training and evaluation data on the Cloud Storage bucket, so that the Cloud TPU can access the data:

    (vm)$ t2t-datagen --problem=translate_ende_wmt32k_packed --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR

You can view the data on Cloud Storage by going to the Google Cloud Platform Console and choosing Storage from the left-hand menu. Click the name of the bucket that you created for this tutorial. You should see sharded files named translate_ende_wmt32k_packed-train and translate_ende_wmt32k_packed-dev.

Train an English-German translation model

Run the following commands on your Compute Engine VM:

  1. Set up environment variables for the TPU machine's IP address and port. To find the IP address, run the following command:

    (vm)$ gcloud compute tpus list

    The above command prints the IP address under NETWORK_ENDPOINT:

    demo-tpu   us-central1-b  v2-8        default  READY

    Set these environmental variables:

    (vm)$ TPU_IP=
    (vm)$ TPU_MASTER=grpc://$TPU_IP:8470

  2. Set up an environment variable for the training directory, which must be a Cloud Storage location:

    (vm)$ OUT_DIR=$STORAGE_BUCKET/training/transformer_ende_1

  3. Run t2t-trainer to train and evaluate the model:

    (vm)$ t2t-trainer \
      --model=transformer \
      --hparams_set=transformer_tpu \
      --problem=translate_ende_wmt32k_packed \
      --train_steps=10 \
      --eval_steps=3 \
      --data_dir=$DATA_DIR \
      --output_dir=$OUT_DIR \
      --use_tpu=True \

    The above command runs 10 training steps, then 3 evaluation steps. You can (and should) increase the number of training steps by adjusting the --train_steps flag. Translations usually begin to be reasonable after ~40k steps. The model typically converges to its maximum quality after ~250k steps.

  4. View the output in your Cloud Storage bucket by going to the Google Cloud Platform Console and choosing Storage from the left-hand menu. Click the name of the bucket that you created for this tutorial. Within the bucket, navigate to the training directory, for example, /training/transformer_ende_1, to see the model output. You can launch tensorboard pointing at that directory to see training and evaluation metrics.

Train a language model

You can use the transformer model for language modeling as well. Run the following commands to generate the training data and specify the output file:

(vm)$ t2t-datagen --problem=languagemodel_lm1b8k_packed --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR
(vm)$ OUT_DIR=$STORAGE_BUCKET/training/transformer_lang_model

Run the following command to train and evaluate the model:

(vm)$ t2t-trainer \
  --model=transformer \
  --hparams_set=transformer_tpu \
  --problem=languagemodel_lm1b8k_packed \
  --train_steps=10 \
  --eval_steps=8 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu=True \

This model converges after approximately 250,000 steps.

Train a sentiment classifier

You can use the transformer_encoder model for sentiment classification. Run the following commands to generate the training data and specify the output file:

(vm)$ t2t-datagen --problem=sentiment_imdb --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR
(vm)$ OUT_DIR=$STORAGE_BUCKET/training/transformer_sentiment_classifier

Run the following command to train and evaluate the model:

(vm)$ t2t-trainer \
  --model=transformer_encoder \
  --hparams_set=transformer_tiny_tpu \
  --problem=sentiment_imdb \
  --train_steps=10 \
  --eval_steps=2 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu=True \

This model achieves approximately 85% accuracy after approximately 2,000 steps.

Clean up

  1. Disconnect from the Compute Engine VM:

    (vm)$ exit

    Your prompt should now be user@projectname, showing you are in your Cloud Shell.

  2. In your Cloud Shell, run the following command to delete your Compute Engine VM and your Cloud TPU:

    $ ctpu delete

  3. Run ctpu status to make sure you have no instances allocated to avoid unnecessary charges for TPU usage. The deletion might take several minutes. A response like the one below indicates there are no more allocated instances:

    2018/04/28 16:16:23 WARNING: Setting zone to "us-central1-b"
    No instances currently exist.
            Compute Engine VM:     --
            Cloud TPU:             --

  4. When you no longer need the Cloud Storage bucket you created during this tutorial, use the gsutil command to delete it. Replace YOUR-BUCKET-NAME with the name of your Cloud Storage bucket:

    $ gsutil rm -r gs://YOUR-BUCKET-NAME

    See the Cloud Storage pricing guide for free storage limits and other pricing information.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...