Image Classification with Confidence Scores

AI systems can be a lot more useful when we know how much we can rely on a certain prediction. This experiment generates an AI model that returns both the predicted class and a well-calibrated confidence score of the prediction.

Apply for access Private documentation

Intended use

Problem types: To effectively deploy AI models in decision-making, it can be very helpful to know how confident a model is in its prediction. A confidence score enables you to quantify the challenge in decision making for each input, to diagnose when the predictor becomes less effective because of data distribution shifts, or to determine when to route the prediction to human experts because the test sample is an outlier.

This experiment uses a new method for calculating confidence scores. Many of the approaches used for calculating confidence scores for deep neural networks tend to be overly-optimistic and demonstrate high degrees of confidence even in incorrect predictions. This experiment generates well-calibrated confidence scores by comparing challenging inputs with similar prototypes whose classification is known. The learning framework and architecture used in this experiment is a novel approach to generate well-calibrated confidence scores.

This can illuminate the tradeoff between overall model accuracy versus the number of predictions the model makes. Confidence score can be used to refrain from making decisions when it is not above a sufficient threshold (ie 0.9). The model accuracy can be significantly improved by making fewer predictions. If the confidence threshold is 0.0, the model makes predictions 100% of the cases, with low accuracy. If the confidence threshold is 0.9, the model will make predictions for fewer cases, with high accuracy for those predicted cases.

Inputs and outputs:

  • Users provide: Labeled training and evaluation image datasets.
  • Users receive: API access to a private AI model that is composed of a convolutional neural network based classifier with the confidence scorer. Feeding prediction items into this API will return both the predicted class as well as a robust confidence score.

What data do I need?

Data and label types: This experiment works with any image data (containing legally-allowed content) as long as they are well-labeled. Classes can include specific objects within the image (such as dog or flower), as well as characteristics of the image (such as color, shape, texture, sentiment, quality).

Data specifications:

  • Users must provide at least 1000 images for each class that needs to be trained. There is no restriction on the number of classes.
  • Each image may be assigned to only one class.
  • The class labels should be integers between 0 and N-1 (where N is the number of output classes).
  • Stored in the following formats: BMP, GIF, JPEG, PNG.
  • Maximum image size: 3 MP.
  • Image labels should be provided in CSV format.

What skills do I need?

As with all AI Workshop experiments, successful users are likely to have a general understanding of core AI concepts and skills in order to both deploy the experiment technology and interact with our AI researchers and engineers. We also expect familiarity with accessing Google Cloud AI Platform APIs to run the model.