If you need to convert extracted entities to Document AI Warehouse properties, you need to set or update the schema.
Before you set the schema with mapping, you need to know the Document AI processor types and their schemas and entity types. The pipeline flattens the nested entities, so you also need to create mappings for the child entities.
For example, the processor INVOICE_PROCESSOR
has the following entity types:
line_item
line_item/amount
total_amount
{
"property_definitions": [
{
"name": "line_item",
"display_name": "line_item",
"is_searchable": true,
"is_filterable": true,
"text_type_options": {}
},
{
"name": "my_new_receiver_name",
"display_name": "my_new_receiver_name",
"is_searchable": true,
"is_filterable": true,
"text_type_options": {},
"schema_sources": [
{
"name": "receiver_name_in_invoice",
"processor_type": "INVOICE_PROCESSOR"
},
{
"name": "receiver_name_in_w2",
"processor_type": "FORM_W2_PROCESSOR"
}
]
}
]
}
If you want to keep the property name the same as the entity type, you can
directly use the name, such as line_item
in the above example. If you want to
convert all entities with type receiver_name_in_invoice
from the invoice
processor and with receiver_name_in_w2
from the form W2 processor to your new
name my_new_receiver_name
, you can add the mappings in the schema_sources
field like the above example. But after converting, use my_new_receiver_name
for searching and filtering. The property names and schema_source names should
be unique.