Jump to Content
AI & Machine Learning

Document AI offers the ability to search and store documents efficiently with Document AI Warehouse

April 28, 2023
Kiran Bellare

Head of Product Mgmt

Ever tried to find that one document where you saved meeting notes but forgot to put in your notes folder? Just the thought of it is painful. Now imagine how this dilemma manifests at enterprise scale, for example if you're working for a bank that processes thousands of mortgage applications each day and you can't find that one form you are looking for. It’s the proverbial needle in a haystack feeling. 

As enterprises seek to generate business value by controlling, managing, and analyzing their data, documents can prove particularly difficult to process, even when they are part of core processes like customer service forms. This challenge can lead to delays, service lags, and frustrating gaps in meeting stakeholder expectations. It doesn't have to be this way.  

Last fall, at Google Cloud Next ‘22, we expanded the capabilities of our Document AI agent with the launch of Document AI Warehouse, a fully managed cloud-native service to search, store, and govern documents and their manually tagged and AI-extracted data. Document AI Warehouse applies the best of Google Search and AI technologies to documents. It also provides a policy engine to support document validation, mailroom automation, compliance, archive management, and other workflows. 

In this blog post, we’ll dive into these new Document AI capabilities, including how they can help both users to find the information they need to do their jobs faster and organizations to do more with the data in their documents, from contracts, invoices, and HR records to bills of lading, engineering spec sheets, and more. 

Exploring Document AI Warehouse capabilities  

Document AI Warehouse provides a number of features which offer advantages over both traditional on-premises repositories and cloud solutions, including:

1. A single API to manage documents and their data, no infrastructure management required. Document AI Warehouse can handle both structured (e.g. forms, invoices) and unstructured (e.g. contracts, research papers) content, parsing properties (i.e. metadata) from AI-extracted data to manually assigned tags (e.g. account numbers, loan IDs, document types). It can seamlessly extract data from documents using Document AI processors. All these are managed in a single platform with a single CRUD API—so there’s no need to spin up, stitch together, maintain, and scale a separate file store, database, and search engine.

2. Powerful search functionality for quickly finding content in a haystack of millions of documents. Document AI Warehouse provides a simple UI and Search API to explore, view, search, filter, and drill into documents. With its semantic search (supporting stemming, synonyms, abbreviations and misspellings) and faceted, full-text search of documents, users can get to the right documents within seconds, improving speed of operations or transactions. Admins can also set up custom synonyms including industry-specific terms or company-specific acronyms to find content using familiar words. The search facets reports the document volume by each facet (a.k.a “search histogram”). Finally, the Document Explorer UI display is user-configurable to show the relevant columns, so users can find and spot key information within documents without having to open each one of them.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Document_AI_hFbydXx.max-1800x1800.jpg

3. Integrated DocAI Pipelines. Scalable, reliable pipelines to classify, extract thousands-millions of documents in bulk, in or outside Warehouse. Warehouse provides the pipelines and UI to trigger, monitor and retry DocAI processing.

4. Flexible content organization. Documents are cataloged into one or more hierarchical folders, based on application (e.g., a bank statement is placed in a KYC folder, loan folder, bank account folder, etc.), without replicating the document. These folders have their own independent properties and access controls to support application objects and semantics (e.g., LoanID, Loan Underwriting Date). Users can also limit their search within a folder hierarchy and drill down into folders of interest.

5. Rich policies and trigger notifications that integrate with workflows. The product supports a policy engine and conditional triggers and notifications that can integrate with document processing applications and workflows. The conditions can represent business logic on the extracted or tagged data in documents, e.g., “Invoice.Amount > $1000” => PubSub.Notification(“Over-billing alert”).

These conditions and notifications can be triggered when documents are created, updated, or deleted. These policies can be used in applications and workflows such as:

  • Mailroom automation

  • Document validation, e.g., check for Passport.ExpiryDate within six months 

  • Exception management and approval workflows

  • Archive management

Also, the Document Explorer interface lets users track the progress of documents through a workflow pipeline (as shown below), enabling a human to inspect and manage failures and stalled documents in the workflow pipeline.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Document_AI_l8vXbMk.max-1500x1500.jpg

6. Strong governance and controls to manage content access. Fine-grained access control at the document and folder levels can be assigned to users and groups to view, edit, and manage (e.g. share, delete, etc.) documents. Document AI Warehouse integrates with Cloud Identity (IAM) and corporate directories so that users and groups can be easily provisioned. Users or groups can also be federated and synced into Cloud Identity from an enterprise LDAP / identity provider such Azure AD, Active Directory, and Keycloak, or from Google Workspace accounts.

Document AI Warehouse supports four roles (Creator, Viewer, Editor, Manager), which can be assigned to users and groups at a document-level or at a project-level (across all documents). In addition, policies can be used to assign default ACLs to documents based on document type (ie., schema) or other properties of the document. This provides complete flexibility for the admin to control how documents are accessed and shared.

7. Elastic, cloud-scale document platform. Document AI Warehouse can manage large repositories of current and archived documents, and it can elastically scale with the business, without imposing infrastructure management responsibilities or hardware CapEx/OpEx costs. It supports searching, storing, and updates to up to hundreds of millions of documents and high ingest throughput of up to hundreds of transactions per second (within the organization’s Google Cloud quota limits).

What can Document AI Warehouse be used for?

Document AI Warehouse is a horizontal service, and we have seen early adopters apply the product to a broad spectrum of applications and document types, as shown below (with the more popular use cases marked in blue). Document AI Warehouse supports PDFs, Text and OfficeX formats, and we plan to support more formats in the future.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Document_AI_5bzPD6s.max-1000x1000.jpg

Rocket Mortgage partnered with the DocAI team at Google Cloud to develop a one-of-a-kind Fintech knowledgebase. With this partnership and Document AI Warehouse, the company launched a first-of-its-kind search solution that has delivered nearly 2 million answers to questions from mortgage brokers, loan officers, and real estate agents through its 'Pathfinder by Rocket' product.

Getting started

To learn more or get started with Document AI Warehouse, visit our webpage, walk through the Quick Start Guide, and sign up today. There is a $50 free trial/account on this product, which covers basic Proof Of Concept implementation with up to a few thousand documents and/or API calls, based on usage.

Posted in