Automated Image Captions and Descriptions

This experiment is designed to help you get fluent, accurate and grammatically correct description of images. A prototypical use case is to accelerate the process of labeling the images for accessibility, media retrieval and clustering purposes. Other use cases may include creating indexes for media & entertainment storage, cataloging and indexing. These use cases are not constrained to specific industries or functions.

Apply for access Private documentation

The technology has gone through some initial reviews (e.g. for fairness), but is still experimental and does not come with performance SLAs. For more details, please see the AI Workshop homepage.

Inputs and outputs:

Users provide: an Image, which includes an objects of interest with acceptable dimension (min 150x150 px)

Users receive one of the following:

  • Label - One or more words decribing an image
  • Caption - Fluent natural caption for an image

What data do I need?

This experiment works with any image data (containing legally-allowed content). Works best with images that are complete, in focus and clear.

Data specifications:

  • Users must provide at least 1 image with each service call
  • Allowed image format : JPEG, PNG.
  • Maximum image size: 3 MP.
  • Image dimension: At least 150x150
  • Image aspect ratio: Maximum 2.5

What skills do I need?

Successful users are likely to:

  • Understand HTTP requests and response model
  • Have a basic understanding of core AI concepts in order to understand the output of the service
  • Be familiar with accessing Google APIs