Enrichment
Document AI uses Enterprise Knowledge Graph to normalize and
enrich entity extraction results (for supported fields). For example, the addresses
123 Main St Apt 1
and 123 Main street # 1
could be normalized to the same
standardized address.
For each supported field, Document AI also returns a normalizedValue
in addition to the raw extracted field, normalizing the literal text.
This contains the data in a standardized format to reduce post-processing.
Most data belongs to one of the following categories:
- Money
- Date
- Timestamp
- Address
- Boolean
- Integer
- Float
Sample response
The enriched values can be found in the
entities.normalizedValue
field as shown in the following truncated sample:
{
"entities": [
{
"textAnchor": {
"textSegments": [ ... ],
"content": "Google Singapore"
},
"type": "employer_name",
"mentionText": "Google Singapore",
"confidence": 0.69933707,
"pageAnchor": {
"pageRefs": [
{
"boundingPoly": {
"normalizedVertices": [ ... ]
}
}
]
},
"id": "9",
"normalizedValue": {
"text": "Google Asia Pacific, Singapore"
}
}
]
}
In the sample, the original employer_name
"Google Singapore" has been
normalized to "Google Asia Pacific, Singapore".
In the Google Cloud console, the enriched and normalized fields are annotated with G. For example:
Supported processors
Here are the processors and fields that support entity enrichment.
Processors | Enriched fields | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bank Statement Parser
|
|
||||||||||||
W2 Parser
|
|
||||||||||||
Pay Slip Parser
|
|
||||||||||||
Expense Parser
|
|
||||||||||||
Invoice Parser
|
|