Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window.

Object tracking

Object tracking can track multiple objects detected in an input video or video segments and return labels (tags) associated with the detected entities along with the location of the entity in the frame.

Object tracking differs from label detection. While label detection provides labels for the entire frame (without bounding boxes), object tracking detects individual objects and provides a label along with a bounding box that describes the location in the frame for each object. For example, a video of vehicles crossing an intersection may produce labels such as "car" , "truck", "bike", "tires", "lights", "window" and so on. Each label includes a series of bounding boxes showing the location of the object in the frame. Each bounding box also has an associated time segment with a time offset (timestamp) that indicates the duration offset from the beginning of the video. The annotation also contains additional entity information including an entity id that you can use to find more information about that entity in the Google Knowledge Graph Search API.

To make an object tracking request, call the annotate method and specify OBJECT_TRACKING in the features field.

Check out the Video Intelligence API visualizer to see this feature in action.

For an example, see Object Tracking and Shot Change Detection tutorial.