Form Parser
Form Parser v2.0 is a document processing model that extracts key value pairs, tables, selection marks, general fields, and text to augment and automate extraction.
Data-extraction features
Form Parser's features encompass:
Key-value pairs (KVPs): These are sets of two items within a document—a label or key and its corresponding data (a value). You can directly use KVPs or build custom logic to extract structured information. One use for KVPs is to extract data from templated forms.
Generic entities: Parse 11 different fields from documents out of the box. These include:
email
phone
url
date_time
address
person
organization
quantity
price
id
page_number
Text and layout: Use Google's latest OCR engine to extract text and layout information. This includes embedded text from digital PDFs (v2.1 only) or text from images.
Tables: Detect and extract tables from images and PDFs.
Checkboxes: A high-quality extraction detector, which extracts checkboxes from images and PDFs as KVPs.
Languages and regions
- Form Parser 2.0 supports over 200 languages. Learn more.
- We provide feature support in eight regions. Learn more.
Model versions
The following processor versions are compatible with this feature. For more information, see Managing processor versions.
Version ID | Release channel | Description |
---|---|---|
pretrained-form-parser-v1.0-2020-09-23 |
Stable | Legacy version. For best quality and full feature set, use the Form Parser v2.0. |
pretrained-form-parser-v2.0-2022-11-10 |
Stable | Recommended version. Supports generic entities and includes upgraded table, KVP, and checkbox model, as well as more than 200 languages. |
pretrained-form-parser-v2.1-2023-06-26 |
Release Candidate | Public Preview version. Same model as v2.0 with native text extraction from digital PDF files enabled. |
Limitations
Prior JPEG compressions for TIFF are unsupported. Type of JPEG encapsulation defined by the TIFF version 6.0 specification.
The checkbox model doesn't support parsing radio buttons. Some detected checkboxes might not have corresponding keys.
The model doesn't reliably parse a KVP with an unfilled value, such as a blank form.
The KVP parsing on documents in certain languages may have lower quality than Latin languages.
Next steps
Get started by visiting Process documents with Form Parser.