Document AI overview
This document is a guide to the fundamental concepts of using Document AI. You should read this page before proceeding to any other documentation or quickstarts.
Automate document processing workflows
Businesses all over the world rely heavily on documents to store and convey information. This information often needs to be digitized for it to become useful; however, this is usually accomplished through time-intensive, manual processes.
For example:
- Digitizing books for e-readers
- Filling out medical intake forms at doctor's offices
- Submitting expense reports based on receipts and invoices
- Authenticating identity based on ID cards
- Approving loans based on income information from tax forms
- Understanding contracts for key business agreements
Each of these workflows involve getting text from documents, then understanding how that text corresponds to the data needed. However, each document type has a different structure and layout, and the most important information can vary depending on the specific use case.
Document AI components
Document AI is a document understanding platform that takes unstructured data from documents and transforms it into structured data, making it easier to understand, analyze, and consume.
Document AI uses machine learning and Google Cloud to help you create scalable, end-to-end, cloud-based document processing applications.
Using Document AI, you can:
- Pre-process documents with image quality detection and deskewing
- Extract text and layout information from document files
- Identify key-value pairs in structured forms
- Split and classify documents by type
- Extract and normalize entities
- Label and review documents
- Store, search, organize, govern, and analyze documents and metadata
This diagram illustrates all of the key document processing steps that are supported by Document AI and how they can connect to each other.
Processor
A Document AI processor is an interface between the document file and a machine learning model that performs document processing actions. They can be used to to classify, split, parse or analyze a document.
Each Google Cloud project needs to create its own processor instances.
Processors fit into one of the following categories:
- General - Pre-built processors for compatibility with most documents
- Specialized - Pre-built processors for specific document types
- Procurement - Documents used for purchases and payments, such as invoices and receipts
- Identity - Documents used for identity verification
- Lending - Documents used for mortgage loans
- Contract - Extract and understand entities from business contracts
- Custom - User-created processors for custom documents and use cases
Within each category, there are multiple processor types. Each type is designed for a specific task such as Optical Character Recognition (OCR), form parsing, splitting, classification or entity extraction for specific document types.
Refer to the Full processor and detail list for information about all available processor types for Document AI.
Which processor should I use?
To decide what processor type to use for a specific application, here are some general guidelines:
Use Case | Processor Type |
---|---|
Extract text and layout information from documents | Document OCR |
Extract tables or key-value pairs from a structured form in a document | Form Parser |
Analyze the scanned image quality of a document | Document OCR or Intelligent Document Quality |
Split or classify documents that have a specialized splitter/classifier processor | Specialized splitter/classifier processor that matches the document type |
Extract entities from a document that has a corresponding specialized processor | Specialized processor that matches the document type. Use Uptraining to improve accuracy or extract additional entities |
Split documents that don't have a specialized splitter processor | Create a Custom Document Splitter |
Classify documents that don't have a specialized classifier processor | Create a Custom Document Classifier |
Extract entities from a custom document that meets the custom processor criteria | Create a Custom Document Extractor |
Extract entities from a custom document that does not meet the custom processor criteria | Use Document OCR to extract the text and create an AutoML model for entity extraction |
This diagram helps determine which processor will work best for each use case.
Using Document AI processors
Here are the major steps to use Document AI to start processing documents:
Choose a processor that is suitable for your use case.
For complete information on each processor, see the Full processor and detail list.
Create a processor using the Cloud console or the Document AI API.
Document AI creates a prediction endpoint where you can send your documents.
For detailed instructions, see Creating a processor
Send your document(s) for processing.
Document AI processes the document(s) and returns one or more
Document
objects, which contain the extracted, structured information.For detailed instructions, see Sending a processing request and Handle the processing response.
Related Features and Products
- Human-in-the-Loop (HITL) - Human verification and corrections to ensure accuracy of data extracted by Document AI processors for use in critical business applications.
- Enterprise Knowledge Graph - Enrich data with real world entities and connections.
- Document AI Warehouse - Store, search, organize, govern and analyze documents and their structured metadata.
- Related Products