Build a document summarizer in the Google Cloud console

You can create a summarizer processor using Document AI to summarize the content of documents. The output can be customized based on length and format.

Here is some sample JSON output from the resulting entity:

{
  "type": "summary",
  "mentionText": " Superconductivity is a phenomenon in which a material conducts
  electricity with no resistance. It was discovered in 1911 by Dutch physicist Heike
  Kamerlingh Onnes. In 1986, a new class of materials was discovered that can superconduct
  at much higher temperatures. These materials are called high-temperature superconductors.
  They have the potential to revolutionize the way we use electricity. However,
  high-temperature superconductors are still very expensive to produce. Scientists
  are working on ways to make them more affordable.",
  "normalizedValue": {
    "text": " Superconductivity is a phenomenon in which a material conducts
    electricity with no resistance. It was discovered in 1911 by Dutch physicist
    Heike Kamerlingh Onnes. In 1986, a new class of materials was discovered that
    can superconduct at much higher temperatures. These materials are called
    high-temperature superconductors. They have the potential to revolutionize
    the way we use electricity. However, high-temperature superconductors are
    still very expensive to produce. Scientists are working on ways to make
    them more affordable."
  }
}

Procedure

In this quickstart, you create a document summarizer processor, upload a sample document for processing, and create a custom processor version to adjust the summary structure.


To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Document AI, Cloud Storage APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Document AI, Cloud Storage APIs.

    Enable the APIs

Create a summarizer processor

Use the Google Cloud console to create a summarizer processor. See creating and managing processors for more information.

  1. In the Google Cloud console, in the Document AI section, go to the Workbench page.

    Workbench

  2. For Summarizer, select Create processor. summarizer-1

  3. In the Create processor menu, enter a name for your processor, such as quickstart-summarizer.

  4. Select the region closest to you.

  5. Select Create.

Your processor has now been created.

Test Processor

You are on the Processor overview page of the processor you just created.

summarizer-2

  1. Select on the Customize & build tab to experiment with the processor.

    summarizer-3

  2. Download a sample document

    It is a PDF file containing the Wikipedia page for Superconductivity.

  3. Select Upload Test Document and select the document you just downloaded.

  4. You are now on the Summary page. You can view the OCR detected text and document summarization.

    summarizer-4

  5. Adjust the Length and Format settings to Moderate and Bulleted respectively, then select Rewrite and observe the results.

  6. Go back to the Customize & build page.

Deploy processor version

If you want to use specific summarization settings when processing documents with the API, create a processor version for those settings.

  1. The Summarization settings are set to the last values you used on the previous page.

  2. Select on Create New Version to create a processor version with the specified Summarization settings.

  3. Enter a name for the processor version, such as quickstart-moderate-bulleted, and select Create Version.

  4. Go to the Deploy & Use tab to view the deployment status. Deployment takes a few minutes.

  5. When the version is deployed, you can set it as the Default version, or you can provide the version ID when processing documents with the API.

  6. To use the Document AI API:

You have successfully used Document AI to extract text from a document and summarize it.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

To avoid unnecessary Google Cloud charges, use the Google Cloud console to delete your processor and project if you do not need them.

What's next