Stay organized with collections Save and categorize content based on your preferences.

Document AI overview

This document is a guide to the fundamental concepts of using Document AI. You should read this page before proceeding to any other documentation or quickstarts.

Automate document processing workflows

Businesses all over the world rely heavily on documents to store and convey information. This information often needs to be digitized for it to become useful; however, this is usually accomplished through time-intensive, manual processes.

For example:

  • Digitizing books for e-readers
  • Filling out medical intake forms at doctor's offices
  • Submitting expense reports based on receipts and invoices
  • Authenticating identity based on ID cards
  • Approving loans based on income information from tax forms
  • Understanding contracts for key business agreements

Each of these workflows involve getting text from documents, then understanding how that text corresponds to the data needed. However, each document type has a different structure and layout, and the most important information can vary depending on the specific use case.

Document AI Components

Document AI is a document understanding platform that takes unstructured data from documents and transforms it into structured data, making it easier to understand, analyze, and consume.

Document AI uses machine learning and Google Cloud to help you create scalable, end-to-end, cloud-based document processing applications.

Using Document AI, you can:

  • Pre-process documents with image quality detection and deskewing
  • Extract text and layout information from document files
  • Identify key-value pairs in structured forms
  • Split and classify documents by type
  • Extract and analyze entities
  • Label and review documents

Processor

A Document AI processor is an interface between the document file and a machine learning model that performs document processing actions. They can be used to to classify, split, parse or analyze a document.

Each Google Cloud project needs to create its own processor instances to use Document AI.

Processors fit into one of the following categories:

  • General - Pre-built processors for compatibility with most documents
  • Specialized - Pre-built processors for specific document types
    • Procurement - Documents used for purchases and payments, such as invoices and receipts
    • Identity - Documents used for identity verification
    • Lending - Documents used for mortgage loans
    • Contract - Extract and understand entities from business contracts

    Within each category, there are multiple processor types. Each type is designed for a specific task such as Optical Character Recognition (OCR), form parsing, splitting, classification or entity extraction for specific document types.

    Refer to the Full processor and detail list for information about all available processor types for Document AI.

    Which processor should I use?

    To decide what processor type to use for a specific application, here are some general guidelines:

    Use Case Processor Type
    Extract text and layout information from documents Document OCR Processor
    Extract tables or key-value pairs from a structured form in a document Form Parser Processor
    Analyze the scanned image quality of a document Intelligent Document Quality Processor
    Split or classify documents that have a specialized splitter/classifier processor Specialized splitter/classifier processor that matches the document type
    Extract entities from a document that has a corresponding specialized processor Specialized processor that matches the document type.
    Extract entities from a document that does not have a corresponding specialized processor Use Document OCR Processor to extract the text and create an AutoML model for entity extraction

    Using Document AI processors

    Here are the major steps to use Document AI to start processing documents:

    1. Choose a processor that is suitable for your use case.

      For complete information on each processor, see the Full processor and detail list.

    2. Create a processor using the Cloud console or the Document AI API.

      Document AI creates a prediction endpoint where you can send your documents.

      For detailed instructions, see Creating a processor

    3. Send your document(s) for processing.

      Document AI processes the document(s) and returns one or more documents objects, which contain the extracted, structured information.

      For detailed instructions, see Sending a processing request.