Setting up TensorBoard

This document describes how to set up and run TensorBoard for visualizing and analyzing program performance on Cloud TPU.

Overview

TensorBoard offers a suite of tools designed to present TensorFlow data visually. When used for monitoring, TensorBoard can help identify bottlenecks in processing and suggest ways to improve performance.

Prerequisites

The following instructions assume:

  • You have already run cloud-tpu-profiler 1.12 to create the capture-tpu-profile script.
  • You have already set up your Cloud TPU in Cloud Shell and are ready to run your training application.

If you don't have a model ready to train, you can get started with the MNIST tutorial.

Run TensorBoard

The following instructions assume you have already set up your Cloud TPU in a Cloud Shell.

When you ran ctpu up to create your Compute Engine VM and Cloud TPU, the tool automatically set up port forwarding for the Cloud Shell environment to make TensorBoard available. You need to run Tensorboard in a new Cloud Shell, not the shell that's running your training application.

Follow these steps to run Tensorboard in a separate Cloud Shell:

  1. Open a second Cloud Shell to capture profiling data for TensorBoard.

  2. In the second Cloud Shell, run ctpu up to set some needed environment variables on the new shell:

    $ ctpu up

    This should return output similar to the following:

    2018/08/02 12:53:12 VM already running.
    2018/08/02 12:53:12 TPU already running.
    About to ssh (with port forwarding enabled -- see docs for details)...
    

  3. Create an environment variable containing the address of your Cloud Storage bucket, for use in this Cloud Shell instance:

    (vm)$ export STORAGE_BUCKET=gs://[YOUR STORAGE BUCKET NAME]
    

Run the model and capture monitoring output

There are two ways you can see TensorBoard trace information, static Trace Viewer or streaming Trace Viewer. Static Trace Viewer is limited to 1M events per Cloud TPU. If you need to access more events, use streaming Trace Viewer. Both setups are shown below.

  1. In the first Cloud Shell, run your TensorFlow model training application. For example, if you're using the MNIST TPU model, run mnist_tpu.py as described in the MNIST tutorial.

  2. In the second Cloud Shell, start TensorBoard specifying the type of Trace viewer you want to use:

    • For static Trace Viewer, run the following command, replacing OUTPUT-FILE with the name of the file where checkpoints, summaries, and TensorBoard output are stored during model training:

      (vm)$ tensorboard --logdir=${STORAGE_BUCKET}/[OUTPUT-FILE] &
      
    • For streaming Trace Viewer, copy the IP address of your TPU host from the Google Cloud Platform Console before running the TensorBoard command.

      1. In the GCP Console navigation sidebar, select Compute Engine -> TPUs, copy the Internal IP address for your Cloud TPU.

      2. Run the following command to use TensorBoard with streaming Trace Viewer, replacing OUTPUT-FILE with the name of the file where checkpoints, summaries, and TensorBoard output are stored during model training:

      (vm)$ tensorboard --logdir=${STORAGE_BUCKET}/[OUTPUT-FILE] \
        --master_tpu_unsecure_channel=[YOUR TPU IP] &
      
  3. In the second Cloud Shell, start the profiling capture application:

    (vm)$ capture_tpu_profile --tpu=[YOUR TPU NAME] --logdir=${STORAGE_BUCKET}/output
  4. Click the Web preview button in Cloud Shell and open port 8080 to view the TensorBoard output.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...