Downloading, preprocessing, and uploading the COCO dataset

Before you can train a model, you must prepare the training data for TPU use.

This topic describes how to prepare the COCO dataset for models on Cloud TPU.

COCO is a large-scale object detection, segmentation, and captioning dataset. In this step, you convert this dataset into a set of TFRecords (*.tfrecord) that the training application can use.

To prepare the COCO dataset, start a VM and run the COCO setup script. You do not need the Cloud TPU set up until after you have prepared the dataset. Since Cloud TPU charges begin when the TPU is set up, best practice is to set up the Compute Engine VM, prepare the dataset, and then set up the Cloud TPU.

Use the TPU setup procedure to set up the Cloud TPU after the dataset is prepared.

Machine learning models that use the COCO dataset include:

  • Mask-RCNN
  • Retinanet

Prepare the dataset

The COCO dataset will be stored on your Cloud Storage. If you have not previously set the storage bucket variable, do that now:

(vm)$ export STORAGE_BUCKET=gs://your-bucket-name

Run the script to convert the COCO dataset into a set of TFRecords (*.tfrecord) that the training application expects.

(vm)$ cd /usr/share/tpu/tools/datasets
(vm)$ sudo bash /usr/share/tpu/tools/datasets/ ./data/dir/coco

This installs the required libraries and then runs the preprocessing script. It outputs a number of *.tfrecord files in your local data directory. The COCO download and conversion script takes approximately 1 hour to complete.

Copy the data to your Cloud Storage bucket

After you convert the data into TFRecords, copy them from local storage to your Cloud Storage bucket using the gsutil command. You must also copy the annotation files. These files help validate the model's performance.

(vm)$ gsutil -m cp ./data/dir/coco/*.tfrecord ${STORAGE_BUCKET}/coco
(vm)$ gsutil cp ./data/dir/coco/raw-data/annotations/*.json ${STORAGE_BUCKET}/coco
Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.