Custom extractor overview
Custom extractor extracts entities from documents of a particular type. For example, it can extract the items in a menu or the name and contact information from a resume.
Overview
The goal of the custom extractor is to enable Document AI users to build custom entity extraction solutions for new document types for which no pre-trained processors are available. Custom extractor includes a combination of layout-aware deep learning models (for generative AI and custom models) and template-based models.
Which training method should I use?
Custom extractor supports a wide range of use cases with three different modes.
Training method | Document examples | Document layout variation | Free form text or paragraphs | Number of training documents for production-ready quality, depending on variability | |
---|---|---|---|---|---|
Fine tune and foundation model (generative AI). | Contract, terms of service, invoice, bank statement, bill of lading, payslips. | High to Low (preferred). | High. | Medium: 0-50+ documents. | |
Custom model. | Model. | Similar forms with layout variation across years or vendors (for example, W9). | Low to medium. | Low. | High: 10-100+ documents. |
Template. | Tax forms with a fixed layout (for example, Forms 941 and 709). | None. | Low. | Low (3 documents). |
Because foundation models typically require fewer training documents, they're recommended as the first option for all variable layouts.