Interpret prediction results from video object tracking models

After requesting a prediction, Vertex AI returns results based on your model's objective. Predictions from an object tracking model return time and locations of objects to track, according to your own defined labels. The model assigns a confidence score to each prediction, which communicates how confident your model accurately identified and tracked an object. The higher the number, the higher the model's confidence in the correctness of the prediction.

Example batch prediction output

The following sample is the predicted result for a model that tracks cats and dogs in a video. Each result includes a label (cat or dog) for the object being tracked, a time segment that specifies when and for how long the object is being tracked, and a bounding box that describes the location of the object.

{
  "instance": {
   "content": "gs://bucket/video.mp4",
    "mimeType": "video/mp4",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s"
  }
  "prediction": [{
    "id": "1",
    "displayName": "cat",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "3.4s",
    "frames": [{
      "timeOffset": "1.2s",
      "xMin": 0.1,
      "xMax": 0.2,
      "yMin": 0.3,
      "yMax": 0.4
    }, {
      "timeOffset": "3.4s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.7
  }, {
    "id": "1",
    "displayName": "cat",
    "timeSegmentStart": "4.8s",
    "timeSegmentEnd": "4.8s",
    "frames": [{
      "timeOffset": "4.8s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.6
  }, {
    "id": "2",
    "displayName": "dog",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "3.4s",
    "frames": [{
      "timeOffset": "1.2s",
      "xMin": 0.1,
      "xMax": 0.2,
      "yMin": 0.3,
      "yMax": 0.4
    }, {
      "timeOffset": "3.4s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.5
  }]
}