Stay organized with collections Save and categorize content based on your preferences.

Document AI overview

This document is a guide to the fundamental concepts of using Document AI. You should read this page before proceeding to any other documentation or quickstarts.

Automate document processing workflows

Businesses all over the world rely heavily on documents to store and convey information. This information often needs to be digitized for it to become useful; however, this is usually accomplished through time-intensive, manual processes.

For example:

  • Digitizing books for e-readers
  • Filling out medical intake forms at doctor's offices
  • Submitting expense reports based on receipts and invoices
  • Authenticating identity based on ID cards
  • Approving loans based on income information from tax forms
  • Understanding contracts for key business agreements

Each of these workflows involve getting text from documents, then understanding how that text corresponds to the data needed. However, each document type has a different structure and layout, and the most important information can vary depending on the specific use case.

Document AI components

Document AI is a document understanding platform that takes unstructured data from documents and transforms it into structured data, making it easier to understand, analyze, and consume.

Document AI uses machine learning and Google Cloud to help you create scalable, end-to-end, cloud-based document processing applications.

Using Document AI, you can:

  • Pre-process documents with image quality detection and deskewing
  • Extract text and layout information from document files
  • Identify key-value pairs in structured forms
  • Split and classify documents by type
  • Extract and normalize entities
  • Label and review documents
  • Store, search, organize, govern, and analyze documents and metadata

This diagram illustrates all of the key document processing steps that are supported by Document AI and how they can connect to each other.

Processor

A Document AI processor is an interface between the document file and a machine learning model that performs document processing actions. They can be used to to classify, split, parse or analyze a document.

Each Google Cloud project needs to create its own processor instances.

Processors fit into one of the following categories:

  • General - Pre-built processors for compatibility with most documents
  • Specialized - Pre-built processors for specific document types
    • Procurement - Documents used for purchases and payments, such as invoices and receipts
    • Identity - Documents used for identity verification
    • Lending - Documents used for mortgage loans
    • Contract - Extract and understand entities from business contracts
  • Custom - User-created processors for custom documents and use cases

Within each category, there are multiple processor types. Each type is designed for a specific task such as Optical Character Recognition (OCR), form parsing, splitting, classification or entity extraction for specific document types.

Refer to the Full processor and detail list for information about all available processor types for Document AI.

Which processor should I use?

To decide what processor type to use for a specific application, here are some general guidelines:

Use Case Processor Type
Extract text and layout information from documents Document OCR
Extract tables or key-value pairs from a structured form in a document Form Parser
Analyze the scanned image quality of a document Document OCR or Intelligent Document Quality
Split or classify documents that have a specialized splitter/classifier processor Specialized splitter/classifier processor that matches the document type
Extract entities from a document that has a corresponding specialized processor Specialized processor that matches the document type. Use Uptraining to improve accuracy or extract additional entities
Extract entities from a custom document that meets the custom processor criteria Create a Custom Document Extractor
Extract entities from a custom document that does not meet the custom processor criteria Use Document OCR to extract the text and create an AutoML model for entity extraction

This diagram helps determine which processor will work best for each use case.

Using Document AI processors

Here are the major steps to use Document AI to start processing documents:

  1. Choose a processor that is suitable for your use case.

    For complete information on each processor, see the Full processor and detail list.

  2. Create a processor using the Cloud console or the Document AI API.

    Document AI creates a prediction endpoint where you can send your documents.

    For detailed instructions, see Creating a processor

  3. Send your document(s) for processing.

    Document AI processes the document(s) and returns one or more Document objects, which contain the extracted, structured information.

    For detailed instructions, see Sending a processing request and Handle the processing response.