Form Parser

Form Parser v2.0 is a document processing model that extracts key value pairs, tables, selection marks, general fields, and text to augment and automate extraction.

Data-extraction features

Form Parser's features encompass:

  • Key-value pairs (KVPs): These are sets of two items within a document—a label or key and its corresponding data (a value). You can directly use KVPs or build custom logic to extract structured information. One use for KVPs is to extract data from templated forms.

  • Generic entities: Parse 11 different fields from documents out of the box. These include:

    • email
    • phone
    • url
    • date_time
    • address
    • person
    • organization
    • quantity
    • price
    • id
    • page_number
  • Text and layout: Use Google's latest OCR engine to extract text and layout information. This includes embedded text from digital PDFs (v2.1 only) or text from images.

  • Tables: Detect and extract tables from images and PDFs.

  • Checkboxes: A high-quality extraction detector, which extracts checkboxes from images and PDFs as KVPs.

Languages and regions

  • Form Parser 2.0 supports over 200 languages. Learn more.
  • We provide feature support in eight regions. Learn more.

Model versions

The following processor versions are compatible with this feature. For more information, see Managing processor versions.

Version ID Release channel Description
pretrained-form-parser-v1.0-2020-09-23 Stable Legacy version. For best quality and full feature set, use the Form Parser v2.0.
pretrained-form-parser-v2.0-2022-11-10 Stable Recommended version. Supports generic entities and includes upgraded table, KVP, and checkbox model, as well as more than 200 languages.
pretrained-form-parser-v2.1-2023-06-26 Release Candidate Public Preview version. Same model as v2.0 with native text extraction from digital PDF files enabled.

Limitations

  • Prior JPEG compressions for TIFF are unsupported. Type of JPEG encapsulation defined by the TIFF version 6.0 specification.

  • The checkbox model doesn't support parsing radio buttons. Some detected checkboxes might not have corresponding keys.

  • The model doesn't reliably parse a KVP with an unfilled value, such as a blank form.

  • The KVP parsing on documents in certain languages may have lower quality than Latin languages.

Next steps

Get started by visiting Process documents with Form Parser.