Set schemas with mapping

If you need to convert extracted entities to Document AI Warehouse properties, you need to set or update the schema.

Before you set the schema with mapping, you need to know the Document AI processor types and their schemas and entity types. The pipeline flattens the nested entities, so you also need to create mappings for the child entities.

For example, the processor INVOICE_PROCESSOR has the following entity types:

  • line_item
  • line_item/amount
  • total_amount
{
  "property_definitions": [
    {
      "name": "line_item",
      "display_name": "line_item",
      "is_searchable": true,
      "is_filterable": true,
      "text_type_options": {}
    },
    {
      "name": "my_new_receiver_name",
      "display_name": "my_new_receiver_name",
      "is_searchable": true,
      "is_filterable": true,
      "text_type_options": {},
      "schema_sources": [
        {
          "name": "receiver_name_in_invoice",
          "processor_type": "INVOICE_PROCESSOR"
        },
        {
          "name": "receiver_name_in_w2",
          "processor_type": "FORM_W2_PROCESSOR"
        }
      ]
    }
  ]
}

If you want to keep the property name the same as the entity type, you can directly use the name, such as line_item in the above example. If you want to convert all entities with type receiver_name_in_invoice from the invoice processor and with receiver_name_in_w2 from the form W2 processor to your new name my_new_receiver_name, you can add the mappings in the schema_sources field like the above example. But after converting, use my_new_receiver_name for searching and filtering. The property names and schema_source names should be unique.