Jump to Content
AI & Machine Learning

Document AI adds three new capabilities to its OCR engine

December 21, 2022
https://storage.googleapis.com/gweb-cloudblog-publish/images/aiml2022_PO1vxqJ.max-2500x2500.jpg
Steve Z.

Product Manager

Devaki Kulkarni

Product Manager

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Documents are indispensable parts of our professional and personal lives. They give us crucial insights that help us become more efficient, that organize and optimize information, and that even help us to stay competitive. But as documents become increasingly complex, and as the variety of document types continues to expand, it has become increasingly challenging for people and businesses to sift through the ocean of bits and bytes in order to extract actionable insights. 

This is where Google Cloud’s Document AI comes in. It is a unified, AI-powered suite for understanding and organizing documents. Document AI consists of Document AI Workbench (state-of-the-art custom ML platform), Document AI Warehouse (managed service with document storage and analytics capabilities), and a rich set of pre-trained document processors. Underpinning these services is the ability to extract text accurately from various types of documents with a world-class Document Optical Character Recognition (OCR) engine.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_OCR_engine_122122.max-1500x1500.jpg

Google Cloud’s Document AI OCR takes an unstructured document as input and extracts text and layout (e.g., paragraphs, lines, etc.) from the document. Covering over 200 languages, Document AI OCR is powered by state-of-the-art machine learning models developed by Google Cloud and Google Research teams. 

Today, we are pleased to announce three new OCR features in Public Preview that can further enhance your document processing workflows. 

1. Assess page-level quality of documents with Intelligent Document Quality (IDQ) 

With Document AI OCR, Google Cloud customers and partners can programmatically extract key document characteristics – word frequency distributions, relative positioning of line items, dominant language of the input document, etc. – as critical inputs to their downstream business logic. Today, we are adding another important document assessment signal to this toolbox: Intelligent Document Quality (IDQ) scores. 

IDQ provides page-level quality metrics in the following eight dimensions:

  1. Blurriness 

  2. Level of optical noise 

  3. Darkness

  4. Faintness

  5. Presence of smaller-than-usual fonts

  6. Document getting cut off

  7. Text spans getting cut off

  8. Glares due to lighting conditions

Being able to discern the optical quality of documents helps assess which documents must be processed differently based on their quality, making the overall document processing pipeline more efficient. For example, Gary Lewis, Managing Director of lending and deposit solutions at Jack Henry, noted, “Google’s Document AI technology, enriched with Intelligent Document Quality (IDQ) signals, will help businesses to automate the data capture of invoices and payments when sending to our factoring customers for purchasing. This creates internal efficiencies, reduces risk for the factor/lender, and gets financing into the hands of cash-constrained businesses quickly.”

Overall, document quality metrics pave the way for more intelligent routing of documents for downstream analytics. The reference workflow below uses document quality scores to split and classify documents before sending them to either the pre-built Form Parser (in the case of high document quality) or a Custom Document Extractor trained specifically on lower-quality datasets.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_OCR_engine_122122.max-1400x1400.jpg

2. Process digital PDF documents with confidence with built-in digital PDF support

The PDF format is popular in various business applications such as procurement (invoices, purchase orders), lending (W-2 forms, paystubs), and contracts (leasing or mortgage agreements). PDF documents can be image-based (e.g., a scanned driver’s license) or digital, where you can hover over, highlight, and copy/paste embedded text in a PDF document the same way as you interact with a text file such as Google Doc or Microsoft Word. 

We are happy to announce digital PDF support in Document AI OCR. The digital PDF feature extracts text and symbols exactly as they appear in the source documents, therefore making our OCR engine highly performant in complex visual scenarios such as rotated texts, extreme font sizes and/or styles, or partially hidden text.  

Discussing the importance and prevalence of PDF documents in banking and finance (e.g., bank statements, mortgage agreements, etc.), Ritesh Biswas, Director, Google Cloud Practice at PwC, said, “The Document AI OCR solution from Google Cloud, especially its support for digital PDF input formats, has enabled PwC to bring digital transformation to the global financial services industry.”

3. “Freeze” model characteristics with OCR versioning

As a fully managed cloud-based service, Document AI OCR regularly upgrades the underlying AI/ML models to maintain its world-class accuracy across over 200 languages and scripts. These model upgrades, while providing new features and enhancements, may occasionally lead to changes in OCR behavior compared to an earlier version. 

Today, we are launching OCR versioning, which enables users to pin to a historical OCR model behavior. The “frozen” model versions, in turn, give our customers and partners peace of mind, ensuring consistent OCR behavior. For industries with rigorous compliance requirements, this update also helps maintain the same model version, thus minimizing the need and effort to recertify stacks between releases. According to Jagadheeswaran Kathirvel, Senior Principal Architect at Mr. Cooper, “Having consistent OCR behavior is mission-critical to our business workflows. We value Google Cloud’s OCR versioning capability that enables our products to pin to a specific OCR version for an extended period of time.”

With OCR versioning, you have the full flexibility to select the versioning option that best fits your business needs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_OCR_engine_122122.max-2000x2000.jpg

Getting Started on Document AI OCR

Learn more about the new OCR features and tutorials in the Document AI Documentation or try it directly in your browser (no coding required). For more details on what’s new with Document AI, don’t forget to check out our breakout session from Google Cloud Next 2022.

Posted in