Training using the built-in image object detection algorithm

Training with built-in algorithms on AI Platform Training allows you to submit your dataset and train a model without writing any training code. This page explains how the built-in image object detection algorithm works, and how to use it.


The built-in image object detection algorithm uses your training and validation datasets to train models continuously, and then it outputs the most accurate SavedModel generated during the course of the training job. You can also use hyperparameter tuning to achieve the best model accuracy. The exported SavedModel can be used directly for prediction, either locally or deployed to AI Platform Prediction for production service.


Image built-in algorithms support training with single CPUs, GPUs or TPUs. The resulting SavedModel is compatible with serving on CPUs and GPUs.

The following features are not supported for training with the built-in image object detection algorithm:

Supported machine types

The following AI Platform Training scale tiers and machine types are supported:

  • BASIC scale tier
  • BASIC_TPU scale tier
  • CUSTOM scale tier with any of the Compute Engine machine types supported by AI Platform Training.
  • CUSTOM scale tier with any of the following legacy machine types:
    • standard
    • large_model
    • complex_model_s
    • complex_model_m
    • complex_model_l
    • standard_gpu
    • standard_p100
    • standard_v100
    • large_model_v100
    • complex_model_m_gpu
    • complex_model_l_gpu
    • complex_model_m_p100
    • complex_model_m_v100
    • complex_model_l_v100
    • TPU_V2 (8 cores)

Authorize your Cloud TPU to access your project

Follow these steps to authorize the Cloud TPU service account name associated with your Google Cloud project:

  1. Get your Cloud TPU service account name by calling projects.getConfig. Example:

    curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \$PROJECT_ID:getConfig
  2. Save the value of the serviceAccountProject and tpuServiceAccount field returned by the API.

  3. Initialize the Cloud TPU service account:

    curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
      -H "Content-Type: application/json" -d '{}'  \<serviceAccountProject>/services/

Now add the Cloud TPU service account as a member in your project, with the role Cloud ML Service Agent. Complete the following steps in the Google Cloud console or using the gcloud command:


  1. Log in to the Google Cloud console and choose the project in which you're using the TPU.
  2. Choose IAM & Admin > IAM.
  3. Click the Add button to add a member to the project.
  4. Enter the TPU service account in the Members text box.
  5. Click the Roles dropdown list.
  6. Enable the Cloud ML Service Agent role (Service Agents > Cloud ML Service Agent).


  1. Set environment variables containing your project ID and the Cloud TPU service account:

  2. Grant the ml.serviceAgent role to the Cloud TPU service account:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent

For more details about granting roles to service accounts, see the IAM documentation.

Format input data for training

The built-in image object detection algorithm requires your input data to be formatted as tf.Examples, saved in TFRecord file(s). The tf.Example data structure and TFRecord file format are both designed for efficient data reading with TensorFlow.

The TFRecord format is a simple format for storing a sequence of binary records. In this case, all the records contain binary representations of images. Each image, along with its class label(s), is represented as a tf.Example. You can save many tf.Examples to a single TFRecord file. You can also shard a large dataset among multiple TFRecord files.

Learn more about TFRecord and tf.Example.

Convert your images to TFRecords

To convert images to the format required for getting predictions, follow the TensorFlow Model Garden's guide to preparing inputs for object detection.

Check Cloud Storage bucket permissions

To store your data, use a Cloud Storage bucket in the same Google Cloud project you're using to run AI Platform Training jobs. Otherwise, grant AI Platform Training access to the Cloud Storage bucket where your data is stored.

Required input format

To train with the built-in image object detection algorithm, your image data must be structured as tf.Examples that include the following fields:

  • image/encoded is the raw image encoded as a string.

  • image/object/class/label is a list of integer labels for the corresponding image (one label per box).

    The set of integer labels used for your dataset must be a consecutive sequence starting at 1. For example, if your dataset has five classes, then each label must be an integer in the interval [1, 5].

  • image/object/bbox/xmin is a list of normalized left x coordinates for the corresponding image (one coordinate per box). Each coordinate must be in the interval [0, 1].

  • image/object/bbox/xmax is a list of normalized right x coordinates for the corresponding image (one coordinate per box). Each coordinate must be in the interval [0, 1].

  • image/object/bbox/ymin is a list of normalized top y coordinates for the corresponding image (one coordinate per box). Each coordinate must be in the interval [0, 1].

  • image/object/bbox/ymax is a list of normalized bottom y coordinates for the corresponding image (one coordinate per box). Each coordinate must be in the interval [0, 1].

The following example shows the structure of a tf.Example for an image containing two bounding boxes. The first box has the label 1, its top-left corner is at the normalized coordinates (0.1, 0.4), and its bottom-right corner is at the normalized coordinates (0.5, 0.8). The second box has the label 2, its top-left corner is at the normalized coordinates (0.3, 0.5), and its bottom-right corner is at the normalized coordinates (0.4, 0.7).

    'image/encoded': '<encoded image data>',
    'image/object/class/label': [1, 2],
    'image/object/bbox/xmin': [0.1, 0.3],
    'image/object/bbox/xmax': [0.5, 0.4],
    'image/object/bbox/ymin': [0.4, 0.5],
    'image/object/bbox/ymax': [0.8, 0.7]

This tf.Example format follows the same one used in the TFRecord object detection script.

Getting the best SavedModel as output

When the training job completes, AI Platform Training writes a TensorFlow SavedModel to the Cloud Storage bucket you specified as jobDir when you submitted the job. The SavedModel is written to jobDir/model. For example, if you submit the job to gs://your-bucket-name/your-job-dir, then AI Platform Training writes the SavedModel to gs://your-bucket-name/your-job-dir/model.

If you enabled hyperparameter tuning, AI Platform Training returns the TensorFlow SavedModel with the highest accuracy achieved during the training process. For example, if you submitted a training job with 2,500 training steps, and the accuracy was highest at 2,000 steps, you get a TensorFlow SavedModel saved from that particular point.

Each trial of AI Platform Training writes the TensorFlow SavedModel with the highest accuracy to its own directory within your Cloud Storage bucket. For example, gs://your-bucket-name/your-job-dir/model/trial_{trial_id}.

The signature of the output SavedModel is:

  The given SavedModel SignatureDef contains the following input(s):
    inputs['encoded_image'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: encoded_image_string_tensor:0
    inputs['key'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: key:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['detection_boxes'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 100, 4)
        name: detection_boxes:0
    outputs['detection_classes'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 100)
        name: detection_classes:0
    outputs['detection_scores'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 100)
        name: detection_scores:0
    outputs['key'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: Identity:0
    outputs['num_detections'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1)
        name: num_detections:0
  Method name is: tensorflow/serving/predict


  • encoded_image: The raw (not decoded) image bytes. This is the same as image/encoded stored in tf.Example.
  • key: The string value identifier of prediction input. This value is passed through to the output key. In batch prediction, this helps to map the prediction output to the input.


  • num_detections: The number of detected bounding boxes.
  • detection_boxes: A list of relative (value in [0,1]) coordinates ([ymin, xmin, ymax, xmax]) of the detection bounding boxes.
  • detection_classes: A list of predicted class (integer) labels for each detection box in detection_boxes.
  • detection_scores: A list of scores for each detection box in detection_boxes.
  • key: The output key.

The following is an example of prediction outputs:

{u'detection_classes': [1.0, 3.0, 3.0, ...],
u'key': u'test_key',
u'num_detections': 100.0,
u'detection_scores': [0.24401935935020447, 0.19375669956207275, 0.18359294533729553, ...]]}

Example configurations

If you submit a job using gcloud, you need to create a config.yaml file for your machine type and hyperparameter tuning specifications. If you use Google Cloud console, you don't need to create this file. Learn how to submit a training job.

The following example config.yaml file shows how to allocate TPU resources for your training job:

cat << EOF > config.yaml
  scaleTier: CUSTOM
  masterType: n1-standard-16
      type: NVIDIA_TESLA_P100
      count: 1
  workerType:  cloud_tpu
    tpuTfVersion: 1.14
      type: TPU_V2
      count: 8
  workerCount: 1

Next, use your config.yaml file to submit a training job.

Hyperparameter tuning configuration

To use hyperparameter tuning, include your hyperparameter tuning configuration in the same config.yaml file as your machine configuration.

You can find brief explanations of each hyperparameter within the Google Cloud console, and a more comprehensive explanation in the reference for the built-in image object detection algorithm.

The following example config.yaml file shows how to allocate TPU resources for your training job, and includes hyperparameter tuning configuration:

cat << EOF > config.yaml
  # Use a cluster with many workers and a few parameter servers.
  scaleTier: CUSTOM
  masterType: n1-standard-16
      type: NVIDIA_TESLA_P100
      count: 1
  workerType:  cloud_tpu
      type: TPU_V2
      count: 8
  workerCount: 1
  # The following are hyperparameter configs.
   goal: MAXIMIZE
   hyperparameterMetricTag: "AP"
   maxTrials: 6
   maxParallelTrials: 3
   enableTrialEarlyStopping: True
   - parameterName: initial_learning_rate
     type: DOUBLE
     minValue: 0.001
     maxValue: 0.1
     scaleType: UNIT_LOG_SCALE

Submit an image object detection training job

This section explains how to submit a training job using the built-in image object detection algorithm.


Select your algorithm

  1. Go to the AI Platform Training Jobs page in the Google Cloud console:

    AI Platform Training Jobs page

  2. Click the New training job button. From the options that display below, click Built-in algorithm training.

  3. On the Create a new training job page, select image object detection and click Next.

Select your training and validation data

  1. In the drop-down box under Training data, specify whether you are using a single file or multiple files:

    • For a single file, leave "Use single file in a GCS bucket" selected.
    • For multiple files, select "Use multiple files stored in one Cloud Storage directory".
  2. For Directory path, click Browse. In the right panel, click the name of the bucket where you uploaded the training data, and navigate to your file.

    If you're selecting multiple files, place your wildcard characters in Wildcard name. The "Complete GCS path" displays below to help you confirm that the path is correct.

  3. In the drop-down box under Validation data, specify whether you are using a single file or multiple files:

    • For a single file, leave "Use single file in a GCS bucket" selected.
    • For multiple files, select "Use multiple files stored in one Cloud Storage directory".
  4. For Directory path, click Browse. In the right panel, click the name of the bucket where you uploaded the training data, and navigate to your file.

    If you're selecting multiple files, place your wildcard characters in Wildcard name. The "Complete GCS path" displays below to help you confirm that the path is correct.

  5. In Output directory, enter the path to the Cloud Storage bucket where you want AI Platform Training to store the outputs from your training job. You can fill in your Cloud Storage bucket path directly, or click the Browse button to select it.

    To keep things organized, create a new directory within your Cloud Storage bucket for this training job. You can do this within the Browse pane.

    Click Next.

Set the algorithm arguments

Each algorithm-specific argument displays a default value for training jobs without hyperparameter tuning. If you enable hyperparameter tuning on an algorithm argument, you must specify its minimum and maximum value.

To learn more about all the algorithm arguments, follow the links in the Google Cloud console and refer to the built-in image object detection reference for more details.

Submit the job

On the Job settings tab:

  1. Enter a unique Job ID.
  2. Enter an available region (such as "us-central1").
  3. To select machine types, select "CUSTOM" for the scale tier. A section to provide your Custom cluster specification displays.
    1. Select an available machine type for Master type.
    2. If you want to use TPUs, set the Worker type to cloud_tpu. The worker count defaults to 1.

Click Done to submit the training job.


  1. Set environment variables for your job:

    # Specify the same region where your data is stored
    gcloud config set project $PROJECT_ID
    gcloud config set compute/region $REGION
    # Set Cloud Storage paths to your training and validation data
    # Include a wildcard if you select multiple files.
    # Specify the Docker container for your built-in algorithm selection
    # Variables for constructing descriptive names for JOB_ID and JOB_DIR
    DATE="$(date '+%Y%m%d_%H%M%S')"
    # Specify an ID for this job
    # Specify the directory where you want your training outputs to be stored
  2. Submit the job:

    gcloud ai-platform jobs submit training $JOB_ID \
      --region=$REGION \
      --config=config.yaml \
      --job-dir=$JOB_DIR \
      -- \
      --training_data_path=$TRAINING_DATA_PATH \
      --validation_data_path=$VALIDATION_DATA_PATH \
      --train_batch_size=64 \
      --num_eval_images=500 \
      --train_steps_per_eval=2000 \
      --max_steps=22500 \
      --num_classes=90 \
      --warmup_steps=500 \
      --initial_learning_rate=0.08 \
      --fpn_type="nasfpn" \
      --aug_scale_min=0.8 \

  3. After the job is submitted successfully, you can view the logs using the following gcloud commands:

    gcloud ai-platform jobs describe $JOB_ID
    gcloud ai-platform jobs stream-logs $JOB_ID

