Jump to Content
Developers & Practitioners

Diving into your documents with DocAI

May 4, 2021
https://storage.googleapis.com/gweb-cloudblog-publish/images/docAI.max-1300x1300.jpeg
Anu Srivastava

Senior Developer Programs Engineer

Lukas Rutishauser

Staff Software Engineer

? Prefer to listen? Check out this episode on the Google Cloud Reader podcast

We recently announced the GA of the Document AI Platform, Google's solution for automating and validating documents to streamline  document workflows. Important business data is not always readily available in computer-readable formats. This is what we consider dark formats such as pdfs, handwritten forms and images. 

The platform is a console for document processing where customers can quickly access all parsers, tools, and solutions. Workflow solutions, built on our specialized parsers with models for common enterprise document types such tax forms, invoices, receipts and more, Lending DocAI and Procurement DocAI are now also in GA.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screenshot_2021-04-03_at_21.57.30.max-1200x1200.png

So why use it? Your business is most likely sitting on a treasure trove of unstructured data, or maybe you have document workflows that require several manual steps. DocAI can help you programmatically extract data for gathering insights with data analytics and help automate tedious and error-prone tasks. Use one of our client libraries to ingest your documents and produce structured data in our new unified document format.

Unified document format

The unified document format (document.proto) is the protocol used to represent all metadata about a document in a standardized, universal format. It is an efficient, standoff format—where the content is kept separate from the annotations. This gives full flexibility to losslessly represent any annotation or attribute of a document or its content whether annotated by humans or an algorithm.

It was created to make building document-based workflow applications easy across tools, components, platforms, and languages inside and outside of DocAI. It is a protocol buffer based format, allowing efficient, flexible encodings—typically binary or json.

The format currently allows the representation of rich OCR representations as well as extracted entities so let's dive in.

https://storage.googleapis.com/gweb-cloudblog-publish/images/unnamed_3_HFN0zdr.max-600x600.png

Document representation - read it

The form parsers return the raw representation of the document content. In many documents, the layout structure is often as important as the actual text. The layout elements include several types such as tokens, lines, paragraphs, blocks, form fields, tables and visual elements. The format allows the representation of rich OCR representations in a hierarchical structure. You can use the layout bounding poly coordinates to detect and highlight the tokens in a UI.

We've drafted a set of notebooks to help you quickly get started with the service. I'll walk through a sample document with our general specialized form parser notebook.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Invoice-Shortest.gif

Extracted data - understand it

Here is where the core of the structured data appears. If you're processing a generic form, DocAI will extract the relevant key value pairs. If you're using one of our specialized parsers for a form type such as an invoice, receipt, utility statement, etc. the data extracted will be merged into a predefined schema. 

To help you with your document processing journey we also provide tools for classification and splitting multi-page, multi-form packets. You could imagine the use case of needing to classify and split individual forms in a large mortgage packet such as W2s, W9s, payslips, etc. The classifier will label the document/entity type and the splitter will intelligently understand where the logical boundaries of the different form types start and end.

Extraction

Not only do you get the "question and answers" from your document, you also get entity normalization and confidence scores. In our specialized parsers, if a certain field is a monetary or date type, the API will also provide an appropriate entity type. This makes it much easier when integrating with other systems or a database with strict schema types.

For data assurance, we provide a score between 0 and 1 on the platform's confidence for that entity. We are able to inspect the confidence scores for both the keys and the associated values on a generic form.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Screenshot_2021-04-29_at_19.59.39_Y7eCLqy.max-900x900.png

We understand that accuracy is critical for business processes so you can use Human-in-the-Loop AI to incorporate a customizable human review workflow with trusted reviewers within their own or partner organizations. You can configure the human review to trigger if the whole document or specific fields do not meet confidence score at your choosing. Including human participation in ML processes allows AI and humans to work together for the best possible results for customers.

Last but not least, making it useful is up to you! We hope we have inspired you to try out Document AI in your app or service. By using the platform you can build tools that reduce manual steps to prevent human errors, integrate other Google services for robust data processing or track documents changes for an audit. You can head over to the DocAI Platform in the Google Cloud console or try out one or our codelabs.

Posted in