Enrichment

Document AI uses Enterprise Knowledge Graph to normalize and enrich entity extraction results (for supported fields). For example, the addresses 123 Main St Apt 1 and 123 Main street # 1 could be normalized to the same standardized address.

For each supported field, Document AI also returns a normalizedValue in addition to the raw extracted field, normalizing the literal text. This contains the data in a standardized format to reduce post-processing.

Most data belongs to one of the following categories:

  • Money
  • Date
  • Timestamp
  • Address
  • Boolean
  • Integer
  • Float

Sample response

The enriched values can be found in the entities.normalizedValue field as shown in the following truncated sample:

{
  "entities": [
    {
      "textAnchor": {
        "textSegments": [ ... ],
        "content": "Google Singapore"
      },
      "type": "employer_name",
      "mentionText": "Google Singapore",
      "confidence": 0.69933707,
      "pageAnchor": {
        "pageRefs": [
          {
            "boundingPoly": {
              "normalizedVertices": [ ... ]
            }
          }
        ]
      },
      "id": "9",
      "normalizedValue": {
        "text": "Google Asia Pacific, Singapore"
      }
    }
  ]
}

In the sample, the original employer_name "Google Singapore" has been normalized to "Google Asia Pacific, Singapore".

In the Google Cloud console, the enriched and normalized fields are annotated with G. For example:

enrichment
Sample normalized field shown in the web application.

Supported processors

Here are the processors and fields that support entity enrichment.

Processors Enriched fields

Bank Statement Parser

Category Pretrained
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Full processor details Detailed entry
  • bank_address
  • bank_name

W2 Parser

Category Pretrained
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Full processor details Detailed entry
  • EmployerNameAndAddress
  • EIN

Pay Slip Parser

Category Pretrained
Solution type Lending
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Full processor details Detailed entry
  • employer_address
  • employer_name

Expense Parser

Category Pretrained
Solution type Procurement
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Full processor details Detailed entry
  • supplier_address
  • supplier_name
  • supplier_phone

Invoice Parser

Category Pretrained
Solution type Procurement
Functions OCR, Entity Extraction
Release stage General availability
Access status Public
Full processor details Detailed entry
  • supplier_address
  • supplier_name
  • supplier_phone