Summarizing Multiple Short Texts

This experiment allows customers to train a “summarizer” - a model that ingests short pieces of text (like restaurant reviews), and outputs a canonical summary of all of those pieces of text (like a single summary review of that restaurant).

Apply for access Private documentation

Intended use

Inputs and outputs:

  • Users provide: A large number of short text items (e.g. reviews), each aligned to a topic (e.g. a restaurant).
  • Users receive: Access to a Summarizer (via API) that takes in groups of short text items related to a single topic, and generates a “canonical” summary that topic. Note that text items and topics used in the training phase can be run through the Summarizer tool.

Industries and functions:

This experiment may be helpful when you have a collection of short documents about a topic, and want a summary of that topic in the form of a similar shape/form as the short documents.

For example, you may have reviews about products, and want to summarize the reviews for a product with a canonical review.

Technical challenges:

The model has only been tested with short documents less than 200 words each, although it scales well with the number of such documents and we tested up to 16.

The model training performs better with more instances which is the number of topics (e.g. products).

The model is trained to output language that is similar to the training documents . However, worse language quality is possible. Some factual mistakes or hallucinations are possible.

If data given at inference differs significantly from the nature of the training data, expect poor performance.

As part of the application to participate in this experiment, we will ask you about your use case, data types, and/or other relevant questions to ensure that the experiment is a good fit for you.

What data do I need?

Data and label types:

Each instance consists of a collection of short documents about a topic. We expect that model training performs best with many thousands of such instances; the more the better.

Specifications:

  • Data specs
    • Ideally thousands of instances or topics (e.g. restaurants)
    • Test items stored in JSON format
    • Each instance should have a unique ID

What skills do I need?

As with all AI Workshop experiments, successful users are likely to be savvy with core AI concepts and skills in order to both deploy the experiment technology and interact with our AI researchers and engineers.

Users of this experiment should also be familiar with or comfortable accessing Google APIs.