Enrichment

Document AI uses Enterprise Knowledge Graph to normalize and enrich entity extraction results (for supported fields). For example, the addresses 123 Main St Apt 1 and 123 Main street # 1 could be normalized to the same standardized address.

For each supported field, Document AI also returns a normalizedValue in addition to the raw extracted field, normalizing the literal text. This contains the data in a standardized format to reduce post-processing.

Most data belongs to one of the following categories:

Money
Date
Timestamp
Address
Boolean
Integer
Float

Sample response

The enriched values can be found in the entities.normalizedValue field as shown in the following truncated sample:

{
  "entities": [
    {
      "textAnchor": {
        "textSegments": [ ... ],
        "content": "Google Singapore"
      },
      "type": "employer_name",
      "mentionText": "Google Singapore",
      "confidence": 0.69933707,
      "pageAnchor": {
        "pageRefs": [
          {
            "boundingPoly": {
              "normalizedVertices": [ ... ]
            }
          }
        ]
      },
      "id": "9",
      "normalizedValue": {
        "text": "Google Asia Pacific, Singapore"
      }
    }
  ]
}

In the sample, the original employer_name "Google Singapore" has been normalized to "Google Asia Pacific, Singapore".

In the Google Cloud console, the enriched and normalized fields are annotated with G. For example:

Supported processors

Here are the processors and fields that support entity enrichment.

Processors Enriched fields

Bank Statement Parser

Category	Pretrained
Solution type	Lending
Functions	OCR, Entity Extraction
Release stage	General availability
Access status	Public
Full processor details	Detailed entry

bank_address
bank_name

W2 Parser

Category	Pretrained
Solution type	Lending
Functions	OCR, Entity Extraction
Release stage	General availability
Access status	Public
Full processor details	Detailed entry

EmployerNameAndAddress
EIN

Pay Slip Parser

Category	Pretrained
Solution type	Lending
Functions	OCR, Entity Extraction
Release stage	General availability
Access status	Public
Full processor details	Detailed entry

employer_address
employer_name

Expense Parser

Category	Pretrained
Solution type	Procurement
Functions	OCR, Entity Extraction
Release stage	General availability
Access status	Public
Full processor details	Detailed entry

supplier_address
supplier_name
supplier_phone

Invoice Parser

Category	Pretrained
Solution type	Procurement
Functions	OCR, Entity Extraction
Release stage	General availability
Access status	Public
Full processor details	Detailed entry

supplier_address
supplier_name
supplier_phone

Setup

Normalization