The Video Intelligence API can identify entities shown in video footage using the LABEL_DETECTION feature and annotate these entities with labels (tags). This feature identifies objects, locations, activities, animal species, products, and more.
Label detection differs from Object tracking. Unlike object tracking, label detection provides labels for the entire frame (without bounding boxes).
For example, for a video of a train at a crossing, the Video Intelligence API returns labels such as "train", "transportation", "railroad crossing", and so on. Each label includes a time segment with the time offset (timestamp) for the entity's appearance from the beginning of the video. Each annotation also contains additional information including an entity id that you can use to find more information about the entity in the Google Knowledge Graph Search API.
Each entity returned can also include associated
category entities in the
categoryEntities field. For example the
"Terrier" entity label has a category of "Dog". Category entities have a
hierarchy. For example, the "Dog" category is a child of the "Mammal"
category in the hierarchy. For a list of the common category entities that the
Video Intelligence uses, see
The analysis can be compartmentalized as follows:
- Segment level:
User-selected segments of a video can be specified for analysis by stipulating beginning and ending timestamps for the purposes of annotation (see VideoSegment). Entities are then identified and labeled within each segment. If no segments are specified, the whole video is treated as one segment.
- Shot level:
Shots (also known as a scene) are automatically detected within every segment (or video). Entities are then identified and labeled within each scene. For details, see Shot change detection
- Frame level:
Entities are identified and labeled within each frame (with one frame per second sampling).