Setting up TensorBoard

This document describes how to set up and run TensorBoard for visualizing and analyzing program performance on Cloud TPU.

Overview

TensorBoard offers a suite of tools designed to present TensorFlow data visually. When used for monitoring, TensorBoard can help identify bottlenecks in processing and suggest ways to improve performance.

Prerequisites

The following instructions assume you have already set up your Cloud TPU in Cloud Shell and are ready to run your training application.

If you don't have a model ready to train, you can get started with the MNIST tutorial.

Install the Cloud TPU profiler

Install the current version of cloud-tpu-profiler 1.14 to create the capture-tpu-profile script.

Run TensorBoard

When you ran ctpu up to create your Compute Engine VM and Cloud TPU, the tool automatically set up port forwarding for the Cloud Shell environment to make TensorBoard available. You need to run Tensorboard in a new Cloud Shell, not the shell that's running your training application.

Follow these steps to run Tensorboard in a separate Cloud Shell:

  1. Open a second Cloud Shell to capture profiling data and to start TensorBoard.

  2. In the second Cloud Shell, run ctpu up to set some needed environment variables on the new shell:

    $ ctpu up

    This should return output similar to the following:

    2018/08/02 12:53:12 VM already running.
    2018/08/02 12:53:12 TPU already running.
    About to ssh (with port forwarding enabled -- see docs for details)...
    

  3. In the second Cloud Shell, create environment variables for your Cloud Storage bucket and model directory. The model directory variable (MODEL_DIR) contains the name of the GCP directory where checkpoints, summaries, and TensorBoard output are stored during model training. For example, MODEL_DIR=${STORAGE_BUCKET}/model.

    (vm)$ export STORAGE_BUCKET=gs://YOUR STORAGE BUCKET NAME
    (vm)$ export MODEL_DIR=${STORAGE_BUCKET}/MODEL DIRECTORY
    

Run the model, capture monitoring output, and display it in TensorBoard

There are two ways you can see TensorBoard trace information, static trace viewer or streaming trace viewer. Static trace viewer is limited to 1 million events per Cloud TPU. If you need to access more events, use streaming trace viewer. Both setups are shown below.

  1. In the first Cloud Shell, run your TensorFlow model training application. For example, if you're using the MNIST model, run `mnist_tpu.py` as described in the MNIST tutorial.
  2. Select the type of trace viewer you want to use: static trace viewer, or streaming trace viewer.
  3. Perform one of the following procedures:
  4. static trace viewer

    1. In the second Cloud Shell, run the following TensorBoard command:
    2. (vm)$ tensorboard --logdir=${MODEL_DIR} &
      
    3. On the bar at the top right-hand side of the Cloud Shell, click the **Web preview** button and open port 8080 to view the TensorBoard output. The TensorBoard UI will appear as a tab in your browser.
    4. Do one of the following to capture the profile.
    • If you are running TensorBoard 1.14 or greater, click the CAPTURE PROFILE button at the top of the TensorBoard window.
    • A detail menu appears where you can specify how to capture the TPU output: by IP address or TPU name.

      Input the IP address or the TPU name to start capturing trace data that is then displayed in TensorBoard. See the Cloud TPU tools guide for more information about changing the defaults for the Profiling Duration and Trace dataset ops values.

    • To capture a profile from the command line instead of using the CAPTURE PROFILE button, in the second Cloud Shell, run the following command:
      (vm)$ capture_tpu_profile --tpu=[YOUR TPU NAME] --logdir=${MODEL_DIR}
      

    streaming trace viewer

    For streaming trace viewer, copy the IP address of your TPU host from the Google Cloud Platform Console before running the TensorBoard command.

    1. In the GCP Console navigation sidebar, select Compute Engine > TPUs, copy the Internal IP address for your Cloud TPU. This is the value you specify for the --master_tpu_unsecure_channel in the TensorBoard command.
    2. Run the following TensorBoard command:
    3. (vm)$ tensorboard --logdir=${MODEL_DIR} --master_tpu_unsecure_channel=[YOUR TPU IP] &
      
    4. On the bar at the top right-hand side of the Cloud Shell, click the Web preview button and open port 8080 to view the TensorBoard output. The TensorBoard UI will appear as a tab in your browser.
    5. To capture streaming trace viewer output, in the second Cloud Shell, run the following capture_tpu_profile command:
    6. (vm)$ capture_tpu_profile --tpu=[YOUR TPU NAME] --logdir=${MODEL_DIR}
      

      This will start capturing profile data and displaying it on TensorBoard.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...