Before you begin continuous evaluation

Before you can use continuous evaluation you must do two things:

  • Train and deploy a model version to AI Platform Prediction for evaluation
  • Enable Google Cloud APIs required for continuous evaluation

Deploy a model version to AI Platform Prediction

AI Platform Data Labeling Service can only perform continuous evaluation on machine learning models that you have deployed to AI Platform Prediction. When you create an evaluation job, you attach it to an AI Platform Prediction model version. Each model version can only have one evaluation job attached.

For an introduction to training a machine learning model and deploying it as a model version to AI Platform Prediction, work through the getting started guide to training and prediction with Keras.

Model version requirements

Continuous evaluation supports several types of machine learning model:

Model types
Image classification A model that takes an image as input and returns one or more labels as output. The input image can be provided as a base64-encoded image or a path to a Cloud Storage file.
Text classification A model that takes text as input and returns one or more labels as output.
General classification A model that takes an arbitrary array or string as input and returns one or more labels as output.

Data Labeling Service cannot assign human reviewers to provide ground truth labels for general classification. If you set up continuous evaluation for a general classification model, you must tag ground truth labels yourself.
Image object detection A model that takes an image as input and returns one or more labeled bounding boxes as output. These bounding boxes predict the locations of objects in the image.

Depending on which task your machine learning model performs, you must train it so that the deployed model version accepts online prediction input and produces online prediction output in a specific format that continuous evaluation supports.

Note that the specifications in the next section extend the JSON body requirements of online prediction requests and responses to AI Platform Prediction. Continuous evaluation currently only samples online predictions; it does not sample batch predictions. Learn about the differences between online and batch prediction.

The highlighted fields in the examples denote the fields required for continuous evaluation. You may include additional fields in your prediction input and output, but Data Labeling Service ignores these when it assigns your input to human reviewers for ground truth labeling and when it calculates evaluation metrics.

The keys of the highlighted fields (yourDataKey, yourLabelKey, yourScoreKey, and yourBoundingBoxKey) can be replaced with any strings that you choose, as long as the strings don't contain the / character. You must specify what these strings are when you create your evaluation job.

The following section describes the model requirements for each type of model:

Image classification

Input format

Your model must receive online prediction requests in the following format:

{
  "instances": [
    {
      "yourDataKey": <image input (string)>,
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The <image input (string)> can be either of the following:

  • A base64-encoded image.
  • The URI of an image stored in Cloud Storage. To ensure Data Labeling Service has permission to read the image, the Cloud Storage bucket should be in the same Google Cloud project where you are creating your evaluation job.
Base64-encoded example

The following example shows an online prediction request body with a base64-encoded image:

{
  "instances": [
    {
      "image_bytes": {
        "b64": "iVBORw0KGgoAAAANSUhEUgAAAAYAAAAGCAYAAADgzO9IAAAAhUlEQVR4AWOAgZeONnHvHcXiGJDBqyDTXa+dVC888oy51F9+eRdY8NdWwYz/RyT//znEsAjEt277+syt5VMJw989DM/+H2MI/L8tVBQk4d38xcWp7ctLhi97ZCZ0rXV6yLA4b6dH59sjTq3fnji1fp4AsWS5j7PXstRg+/b3gU7N351AQgA8+jkf43sjaQAAAABJRU5ErkJggg=="
      }
    }
  ]
}

Note that the input data is nested in the image_bytes field in this example. If your model accepts prediction input like this, make sure to specify this nested structure when you create your evaluation job.

Cloud Storage reference example

The following example shows an online prediction request body with a reference to an image in Cloud Storage:

{
  "instances": [
    {
      "image_path": "gs://cloud-samples-data/datalabeling/image/flower_1.jpeg"
    }
  ]
}

Output format

Your model must return online predictions in the following format:

{
  "predictions": [
    {
      "YOUR_LABEL_KEY": [
        <label (string)>,
        <additional label (string)>,
        ...
      ],
      "YOUR_SCORE_KEY": [
        <score (number)>,
        <additional score (number)>,
        ...
      ],
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The array of labels and the array of scores must have the same length. If your model performs single label classification, there must be one label and one score. If your model performs multilabel classification, each label in the label array must correspond to the score at the same index in the score array.

Scores are required. They represent the confidence or probability your machine learning model assigns to its predictions. Typically, models select a label by comparing the score calculated for a particular input to a classification threshold.

Classification output example

The following example shows an online prediction response body for classification:

{
  "predictions": [
    {
      "sentiments": [
        "happy"
      ],
      "confidence": [
        "0.8"
      ]
    }
  ]
}

Text classification

Input format

Your model must receive online prediction requests in the following format:

{
  "instances": [
    {
      "yourDataKey": <text input (string)>,
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}
Text input example

The following example shows an online prediction request body with text data:

{
  "instances": [
    {
      "text": "If music be the food of love, play on;"
    }
  ]
}

Output format

Your model must return online predictions in the following format:

{
  "predictions": [
    {
      "YOUR_LABEL_KEY": [
        <label (string)>,
        <additional label (string)>,
        ...
      ],
      "YOUR_SCORE_KEY": [
        <score (number)>,
        <additional score (number)>,
        ...
      ],
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The array of labels and the array of scores must have the same length. If your model performs single label classification, there must be one label and one score. If your model performs multilabel classification, each label in the label array must correspond to the score at the same index in the score array.

Scores are required. They represent the confidence or probability your machine learning model assigns to its predictions. Typically, models select a label by comparing the score calculated for a particular input to a classification threshold.

Classification output example

The following example shows an online prediction response body for classification:

{
  "predictions": [
    {
      "sentiments": [
        "happy"
      ],
      "confidence": [
        "0.8"
      ]
    }
  ]
}

General classification

Input format

Your model must receive online prediction requests in the following format:

{
  "instances": [
    {
      "yourDataKey": <input data (string or array)>,
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}
General input example

The following example shows an online prediction request body with an array of input data that contains strings and numbers:

{
  "instances": [
    {
      "weather": [
        "sunny",
        72,
        0.22
      ]
    }
  ]
}

Output format

Your model must return online predictions in the following format:

{
  "predictions": [
    {
      "YOUR_LABEL_KEY": [
        <label (string)>,
        <additional label (string)>,
        ...
      ],
      "YOUR_SCORE_KEY": [
        <score (number)>,
        <additional score (number)>,
        ...
      ],
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The array of labels and the array of scores must have the same length. If your model performs single label classification, there must be one label and one score. If your model performs multilabel classification, each label in the label array must correspond to the score at the same index in the score array.

Scores are required. They represent the confidence or probability your machine learning model assigns to its predictions. Typically, models select a label by comparing the score calculated for a particular input to a classification threshold.

Classification output example

The following example shows an online prediction response body for classification:

{
  "predictions": [
    {
      "sentiments": [
        "happy"
      ],
      "confidence": [
        "0.8"
      ]
    }
  ]
}

Image object detection

Input format

Your model must receive online prediction requests in the following format:

{
  "instances": [
    {
      "yourDataKey": <image input (string)>,
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The <image input (string)> can be either of the following:

  • A base64-encoded image.
  • The URI of an image stored in Cloud Storage. To ensure Data Labeling Service has permission to read the image, the Cloud Storage bucket should be in the same Google Cloud project where you are creating your evaluation job.
Base64-encoded example

The following example shows an online prediction request body with a base64-encoded image:

{
  "instances": [
    {
      "image_bytes": {
        "b64": "iVBORw0KGgoAAAANSUhEUgAAAAYAAAAGCAYAAADgzO9IAAAAhUlEQVR4AWOAgZeONnHvHcXiGJDBqyDTXa+dVC888oy51F9+eRdY8NdWwYz/RyT//znEsAjEt277+syt5VMJw989DM/+H2MI/L8tVBQk4d38xcWp7ctLhi97ZCZ0rXV6yLA4b6dH59sjTq3fnji1fp4AsWS5j7PXstRg+/b3gU7N351AQgA8+jkf43sjaQAAAABJRU5ErkJggg=="
      }
    }
  ]
}

Note that the input data is nested in the image_bytes field in this example. If your model accepts prediction input like this, make sure to specify this nested structure when you create your evaluation job.

Cloud Storage reference example

The following example shows an online prediction request body with a reference to an image in Cloud Storage:

{
  "instances": [
    {
      "image_path": "gs://cloud-samples-data/datalabeling/image/flower_1.jpeg"
    }
  ]
}

Output format

Your model must return online predictions in the following format:

{
  "predictions": [
    {
      "yourBoundingBoxKey": [
        {
          "top_left": {
            "x": <left coordinate for first box>,
            "y": <top coordinate for first box>
          },
          "bottom_right": {
            "x": <right coordinate for first box>,
            "y": <bottom coordinate for first box>
          },
        },
        ...
      ],
      "YOUR_LABEL_KEY": [
        <label for first box>,
        ...
      ],
      "YOUR_SCORE_KEY": [
        <score for first box,
        ...
      ],
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}

The array of bounding boxes, the array of labels, and the array of scores must all have the same length. Each entry in the bounding box array must be labeled by a label at the same index in the label array and correspond to a score at the same index in the score array. For example, yourBoundingBoxKey[0] is labeled by yourLabelKey[0] and has score yourScoreKey[0].

Scores are required. They represent the confidence or probability your machine learning model assigns to its predictions. Typically, models select a label by comparing the score calculated for a particular input to a classification threshold.

Alternative output format

If you use the string detection_boxes for yourBoundingBoxKey, the string detection_classes for yourLabelKey, and the string detection_scores for yourScoreKey, then you can use the following format for your prediction output instead of the standard format:

{
  "predictions": [
    {
      "detection_boxes": [
        {
          "x_min": <left coordinate for first box>,
          "y_min": <top coordinate for first box>,
          "x_max": <right coordinate for first box>,
          "y_max": <bottom coordinate for first box>
        },
        ...
      ],
      "detection_classes": [
        <label for first box>,
        ...
      ],
      "detection_scores": [
        <score for first box,
        ...
      ],
      <optional additional key>: <any JSON data type>,
      ...
    },
    ...
  ]
}
Object detection output example

The follow example shows an online prediction response body for object detection:

{
  "predictions": [
    {
      "bird_locations": [
        {
          "top_left": {
            "x": 53,
            "y": 22
          },
          "bottom_right": {
            "x": 98,
            "y": 150
          }
        }
      ],
      "species": [
        "rufous hummingbird"
      ],
      "probability": [
        0.77
      ]
    }
  ]
}

Enable APIs

You must enable several Google Cloud APIs before you can use continuous evaluation. The following steps assume that you have already enabled the AI Platform Training and Prediction API in the same project when you deployed a model version to AI Platform Prediction.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Data Labeling Service and BigQuery APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Data Labeling Service and BigQuery APIs.

    Enable the APIs

What's next

Read the guide to creating an evaluation job to begin using continuous evaluation.