Label Error Detection for Images

High-quality labels for training data are critical for model-building success, as detrimental items in the data set can adversely impact model performance. However, most approaches to checking data quality are manual, tedious, and time-consuming.

With this experiment, customers bring their labeled training images, and we run it through a series of quality check tools. We return a list of recommendations to improve labels.

Apply for access Private documentation

Intended use

Inputs and outputs:

  • Users provide: Labeled images used to train an image classifier
  • Users receive: A CSV file listing potentially mislabeled training data items

Industries and functions:

Use cases center around image-based challenges, and may be helpful whenever there is uncertainty about the quality of image labels in a training data set. Likely users are model builders who would like to accelerate their data preparation and quality checking.

Technical challenges:

This experiment will be most helpful when users believe that image labels may be noisy.

As part of the application to participate in this experiment, we will ask you about your use case, data types, and/or other relevant questions to ensure that the experiment is a good fit for you.

What data do I need?

Data and label types: This experiment has been designed to help customers accelerate quality checks for images.

  • It is likely to be effective with natural images (i.e. those that can be captured with a camera in the real world)
  • It may not be effective with highly unusual or abstract image types, such as specialized medical images and scans.

We do not accept images or labels with personally identifiable information (e.g. name, email, etc.)

Specifications:

  • Data specs
    • Users must provide at least 5 images for each class that needs to be trained. There is no restriction on the number of classes.
    • A reasonable distribution of data in terms of labels i.e. no extreme cases like highly-unbalanced data sets (e.g. one label is 80+% of the data)
    • Stored in the following formats: JPEG, PNG
    • Maximum image size: 2000x2000 pixels
  • Label specs
    • Image labels should be provided in CSV format

What skills do I need?

As with all AI Workshop experiments, successful users are likely to be savvy with core AI concepts and skills in order to both deploy the experiment technology and interact with our AI researchers and engineers.