Object tracking can track multiple objects detected in an input video or video segments and return labels (tags) associated with the detected entities along with the location of the entity in the frame.
Object tracking differs from label detection. While label detection provides labels for the entire frame (without bounding boxes), object tracking detects individual objects and provides a label along with a bounding box that describes the location in the frame for each object. For example, a video of vehicles crossing an intersection may produce labels such as "car" , "truck", "bike", "tires", "lights", "window" and so on. Each label includes a series of bounding boxes showing the location of the object in the frame. Each bounding box also has an associated time segment with a time offset (timestamp) that indicates the duration offset from the beginning of the video. The annotation also contains additional entity information including an entity id that you can use to find more information about that entity in the Google Knowledge Graph Search API.