Quickstart

Overview: Learn how to use Cloud TPU to train MNIST, a canonical dataset of handwritten digits that is often used to test new machine learning approaches.

This topic is intended for users new to Cloud TPU. For a more detailed exploration of Cloud TPU, try running one of our colabs. You can also view one of the many examples in the Tutorials section.


For step-by-step guidance on this task directly in Cloud Shell Editor, click Guide me:

Guide me


The following sections take you through the same steps as clicking Guide me.

Before you begin

Before starting this tutorial, check that your Google Cloud project is correctly set up. For more information, see Set up an account and a Cloud TPU project.

This tutorial uses the following billable components of Google Cloud:

  • Compute Engine
  • Cloud TPU
  • Cloud Storage

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

This section provides information on setting up Cloud Storage storage and a Compute Engine VM.

  1. Open a Cloud Shell window.

    Open Cloud Shell

  2. Create a variable for your project's ID.

    export PROJECT_ID=project-id
    
  3. Configure gcloud command-line tool to use the project where you want to create Cloud TPU.

    gcloud config set project $PROJECT_ID
    

    The first time you run this command in a new Cloud Shell VM, an Authorize Cloud Shell page is displayed. Click Authorize at the bottom of the page to allow gcloud to make GCP API calls with your credentials.

  4. Create a Cloud Storage bucket using the following command:

    gsutil mb -p ${PROJECT_ID} -c standard -l us-central1 -b on gs://bucket-name
    

    This Cloud Storage bucket stores the data you use to train your model and the training results.

  5. Launch a Compute Engine VM and Cloud TPU using the gcloud command.

    $ gcloud compute tpus execution-groups create \
     --name=mnist-tutorial \
     --zone=us-central1-b \
     --tf-version=2.4.1 \
     --machine-type=n1-standard-1 \
     --accelerator-type=v3-8
    

    Command flag descriptions

    name
    The name of the Cloud TPU to create.
    zone
    The zone where you plan to create your Cloud TPU.
    tf-version
    The version of Tensorflow the gcloud command installs on your VM.
    machine-type
    The machine type of the Compute Engine VM to create.
    accelerator-type
    The type of the Cloud TPU to create.

    For more information on the gcloud command, see the gcloud Reference.

  6. When the gcloud compute tpus execution-groups command has finished executing, verify that your shell prompt has changed from username@projectname to username@vm-name. This change shows that you are now logged into your Compute Engine VM.

    gcloud compute ssh mnist-tutorial --zone=us-central1-b
    

    As you continue these instructions, run each command that begins with (vm)$ in your VM session window.

Run the MNIST TPU model

The source code for the MNIST TPU model is available on GitHub.

Set up environment variables

Create the following variables. Replace bucket-name with your bucket name:

(vm)$ export STORAGE_BUCKET=gs://bucket-name
(vm)$ export TPU_NAME=mnist-tutorial
(vm)$ export MODEL_DIR=$STORAGE_BUCKET/mnist
(vm)$ export DATA_DIR=$STORAGE_BUCKET/data
(vm)$ export PYTHONPATH="$PYTHONPATH:/usr/share/models"

Train the model on Cloud TPU

  1. Change to directory that stores the model:

    (vm)$ cd /usr/share/models/official/vision/image_classification
    
  2. Run the MNIST training script:

    (vm)$ python3 mnist_main.py \
      --tpu=$TPU_NAME \
      --model_dir=$MODEL_DIR \
      --data_dir=$DATA_DIR \
      --train_epochs=10 \
      --distribution_strategy=tpu \
      --download
    

The training script runs in under 5 minutes on a v3-8 Cloud TPU and displays output similar to:

I1203 03:43:15.936553 140096948798912 mnist_main.py:165]
Run stats: {'loss': 0.11427700750786683, 'training_accuracy_top_1': 0.9657697677612305,
'accuracy_top_1': 0.9730902910232544, 'eval_loss': 0.08600160645114051}

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. Disconnect from the Compute Engine instance, if you have not already done so:

    (vm)$ exit
    

    Your prompt should now be username@projectname, showing you are in the Cloud Shell.

  2. Delete your Cloud TPU and Compute Engine resources.

    $ gcloud compute tpus execution-groups delete mnist-tutorial \
      --zone=us-central1-b
    
  3. Verify the resources have been deleted by running gcloud compute tpus execution-groups list. The deletion might take several minutes. A response like the one below indicates your instances have been successfully deleted.

    $ gcloud compute tpus execution-groups list --zone=us-central1-b
    
    NAME             STATUS
    
  4. Delete your Cloud Storage bucket using gsutil as shown below. Replace bucket-name with the name of your Cloud Storage bucket.

    $ gsutil rm -r gs://bucket-name
    

What's next

This quickstart provided you with a brief introduction to working with Cloud TPU. At this point, you have the foundation for the following:

  • Learning more about Cloud TPU
  • Setting up Cloud TPU for your own applications

Learning more

MNIST on Keras Try out using Cloud TPU by running the MNIST model in a colab environment.
Product Overview Review the key features and benefits of Cloud TPU.
Cloud Tensor Processing Units (TPUs) Read more about Cloud TPU, its capabilities, and its advantages.
Pricing Review the pricing information for Cloud TPU.

Setting up

Choosing a TPU service Understand different options for working with Cloud TPU, such as Compute Engine, Google Kubernetes Engine, or AI Platform.
TPU types and zones Learn what TPU types are available in each zone.
TPU versions Understand the different TPU versions and learn how to select the right one for your application.