With this experiment, customers can label each frame in many related videos by only labeling frames in one video. Our system propagates the labels from one video to all the other videos.
|Apply for access||Private documentation|
Inputs and outputs:
- Users provide:
- Many videos/clips of the same action (players pitching a baseball, a machine performing an action on a manufacturing line, etc.)
- Labels for each frame for a few of the videos
- Users receive: Labels for each frame in all the related videos
Industries and functions:
Use-case may include but are not limited to following industries: manufacturing, retail, entertainment, sports analytics. A prototypical use case would be to use our technology to get a dataset of densely labeled (each frame is labeled) videos. The customer is free to use the labels and videos for tasks that typically involve supervised learning (action recognition, action progress tracking etc.). They can possibly leverage Google Cloud Video API for training models with these labels.
This experiment is most useful when customers have many clips/videos of one action but find it difficult to label all frames in all of them. Typical use cases:
- Large numbers of videos (if an average video has 100 frames then labeling tens of thousands of frames becomes challenging as it takes more effort to label frames in a fine-grained way)
- Difficult to provide labeling instructions to get annotators to produce good quality labels for fine-grained tasks (Say we are labeling videos of people pitching baseball and want to track the humans as they do this action. As this task is quite fine-grained you need to provide detailed instructions to annotators and have many redundant annotators to ensure label quality.)
As part of the application to participate in this experiment, we will ask you about your use case, data types, and/or other relevant questions to ensure that the experiment is a good fit for you.
What data do I need?
Data and label types:
This experiment is designed to help users label related videos in a fine-grained way. This densely labeled dataset allows the users the flexibility to choose their model and training technique for any task of their choice.
- All videos should be of the same action or at least related to each other.
- Videos should not contain periodic motion throughout the video.
- Small periodic motion interspersed through the video is okay. But videos of a horse galloping or someone chewing food is not okay.
- Clipping the video to one period of a periodic motion is also acceptable (say the entire video is of a person doing many jumping jacks, if the user can clip the video to the person just doing one cycle of jumping jack then we can still propagate labels)
- Per-frame labels should be provided for at least one video.
- More videos with per-frame labels, better is the label propagation
- Typically the labels occur in chunks, users may provide the sparse timestamps for the labels (0-5s label 1, 5s-15s label 2)
- Video lengths should not exceed 2 minutes.
- At least 50 unlabeled videos required to learn good label propagator.
- Video format: MP4
- Label format: JSON
What skills do I need?
As with all AI Workshop experiments, successful users are likely to be savvy with core AI concepts and skills in order to both deploy the experiment technology and interact with our AI researchers and engineers.
In particular, users of this experiment should:
- Be able to edit videos and provide data as specified
- Be able to use the output labels to solve a meaningful business challenge