Google Cloud

Training an object detector using Cloud Machine Learning Engine

Today we announced the release of the Tensorflow Object Detection API, a new open source framework for object detection that makes model development and research easier. A key feature of our Tensorflow Object Detection API is that users can train it on Cloud Machine Learning Engine, the fully-managed Google Cloud Platform (GCP) service for easily building and running machine learning models using any type of data at virtually any scale.

In this tutorial, you’ll learn the process of training a new object detection model on the Oxford-IIIT Pet dataset, which will be able to detect the location of cats and dogs and identify the breed of each animal.


Example detections from our final trained model. Image source.

This document assumes you’re running on Ubuntu 16.04. Before following along, we need to set up the environment:

  1. Set up a Google Cloud Project, configure billing, and enable the necessary Cloud APIs by completing the “Before you begin” section on this page. Skip the “Set up your environment” section as this is covered in steps 2 and 3 below and tailored for Ubuntu.
  2. Set up the Google Cloud SDK.
  3. Install TensorFlow.

Define Environment Variables

After setting up your GCP project, define the following environment variables to make following this walkthrough a bit easier.

  export PROJECT=$(gcloud config list project --format "value(core.project)")
export YOUR_GCS_BUCKET="gs://${PROJECT}-ml"

Installing the Tensorflow Object Detection API

Assuming that you've already installed Tensorflow, the Object Detection API and other dependencies can then be installed using the following commands:

  git clone
cd models/research
sudo apt-get install protobuf-compiler python-pil python-lxml
protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

You can test your installation by running the following command:

  python object_detection/builders/

Download the Oxford-IIIT Pet Dataset, convert to TFRecords and upload to GCS

The Tensorflow Object Detection API uses the TFRecord format for training and validation datasets. Use the following to download the Oxford-IIIT Pet dataset and convert to TFRecords.

tar -xvf annotations.tar.gz
tar -xvf images.tar.gz
python object_detection/dataset_tools/ \
    --label_map_path=object_detection/data/pet_label_map.pbtxt \
    --data_dir=`pwd` \

You should see two new generated files: pet_train.record and pet_val.record. To use the dataset on GCP, we’ll need to upload it to our Cloud Storage bucket using the following commands. Note that we similarly upload a “label map” (included in the git repository), which corresponds numerical indices predicted by our model to category names (e.g., 4 -> “basset hound”, 5 -> “beagle”).

  gsutil cp pet_train_with_masks.record ${YOUR_GCS_BUCKET}/data/pet_train.record
gsutil cp pet_val_with_masks.record ${YOUR_GCS_BUCKET}/data/pet_val.record
gsutil cp object_detection/data/pet_label_map.pbtxt \

Upload our pretrained COCO Model for Transfer Learning

Training an object detector from scratch can take days! To speed up training, we’ll initialize the pet model using parameters from our provided model that has been pre-trained on the COCO dataset. The weights from this ResNet101-based Faster R-CNN model will be the starting point in our new model (which we'll call a fine-tune checkpoint) and will cut down the training time from days to just a few hours. To initialize from this model, we’ll need to download it and put it in Cloud Storage.

tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* ${YOUR_GCS_BUCKET}/data/

Configuring the pipeline

Jobs are configured using protocol buffers in the TensorFlow Object Detection API. Sample configuration files can be found in object_detection/samples/configs/. These configuration files can be used to tune model and training parameters (such as learning rates, dropout and regularization parameters). The sample configuration files that we've provided need to be modified to know where you've uploaded your dataset and fine-tune checkpoints. The PATH_TO_BE_CONFIGURED strings need to be changed so they point to the dataset files and fine-tune checkpoint you’ve uploaded to your Cloud Storage bucket. Afterwards, we’ll also need to upload the configuration file itself to Cloud Storage.

  sed -i "s|PATH_TO_BE_CONFIGURED|"${YOUR_GCS_BUCKET}"/data|g" object_detection/samples/configs/faster_rcnn_resnet101_pets.config
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \

Running training and evaluation jobs

Before you can run on GCP, you must first package the TensorFlow Object Detection API and TF Slim.

  python sdist
(cd slim && python sdist)

It’s also a good idea to double check that you’ve uploaded the dataset and configuration correctly to your Cloud Storage bucket. You can inspect your bucket using the Cloud Storage browser. The directory structure should look like the following:

  + data/
    - faster_rcnn_resnet101_pets.config
    - model.ckpt.index
    - model.ckpt.meta
    - pet_label_map.pbtxt
    - pet_train.record
    - pet_val.record

After the code has been packaged, we’re ready to start our training and evaluation jobs:

  gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
    --job-dir=${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train \

You should see your jobs on the Machine Learning Engine dashboard and check the logs to ensure that the jobs are progressing. Note that this training job uses distributed asynchronous gradient descent with five worker GPUs and three parameter servers.

Monitoring progress with Tensorboard

You can also monitor the progress of your training and evaluation jobs using Tensorboard. For a new user, you’ll likely have to authenticate your local machine with the Google Cloud SDK.

  gcloud auth application-default login

You can then launch Tensorboard:

  tensorboard --logdir=${YOUR_GCS_BUCKET}

Navigate to the link showed in the terminal (typically localhost:6006). After several hours of training, you should see the following training curves. Typically we reach ~92% mean average precision on the validation set within the first couple of hours of training.


You can also click on the images link to visualize the outputs of the model:


Exporting the Tensorflow graph

Now that you’ve trained an amazing pet detector, you're probably going to want to run your detector on images of your family pet or those of your friends! In order to run detection on a few example images after training, we recommend trying out the Jupyter notebook demo. However, before doing so, you'll have to export your trained model to a TensorFlow graph proto with learned weights baked in as constants. First, you need to identify a candidate checkpoint to export. You can search your bucket using the Google Cloud Storage Browser. The checkpoint should be stored under ${YOUR_GCS_BUCKET}/train. The checkpoint will typically consist of three files:

  1. model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001,
  2. model.ckpt-${CHECKPOINT_NUMBER}.index
  3. model.ckpt-${CHECKPOINT_NUMBER}.meta
After you've identified a candidate checkpoint to export (typically the most recent), run the following command from tensorflow/models:

  # Please define CEHCKPOINT_NUMBER based on the checkpoint you’d like to export
# From tensorflow/models
gsutil cp ${YOUR_GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* .
python object_detection/export_inference_graph \
    --input_type image_tensor \
    --pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
    --checkpoint_path model.ckpt-${CHECKPOINT_NUMBER} \
    --inference_graph_path output_inference_graph.pb

If all has gone well, you should see your exported graph, which will will be stored in a file named output_inference_graph.pb.

Next steps

Congratulations, you've now trained an object detector for various cats and dogs! There are a few things you can do now:

  1. Test your exported model using the provided Jupyter notebook.
  2. Experiment with different model configurations.
If you have feedback on this walkthrough or have other issues, please report it on GitHub; pull requests and contributions are also welcome. Stay tuned for new improvements in the near future!


This walkthrough was created in collaboration with Jonathan Huang and Vivek Rathod from Machine Perception at Google.