Learn how generative AI is transforming document processing. Watch our Next ’23 breakout.
Jump to
Document AI Workbench

Document AI Workbench

Extract data from, classify, and split documents by creating custom machine learning models that are specific to your business. With Document AI Workbench, you can automate business processes by easily training machine learning models for documents like insurance forms, bills of lading, and more. 

  • Create your own custom model with one click 

  • Uptrain existing models to improve performance

  • Extract structured data from, classify, and split documents 

  • Extract data from and summarize documents using generative AI 

  • New Google Cloud customers get $300 in free credits to fully explore


Achieve higher document processing accuracy

Experience fewer errors and enjoy faster and more accurate document processing workflows with custom models that are trained on your business documents.

Process a wide range of document types

Now you don't need to depend on pretrained models for your document processing needs. You can work with a wide range of documents that are not supported by Google’s pretrained models.

Train models without machine learning skills

With Document AI Workbench even users who do not have extensive machine learning skills can get started training or uptraining models with a friendly user interface. 

Key features

Key features

Uptrain existing models to improve performance

Uptraining means that you begin with a prebuilt ML model, and then train this model with your own data to improve its accuracy for your organization’s documents. Pre-built models offer a base model relevant to your document type and help automatically label data so you can build production ready models faster. Document AI generates evaluation metrics, such as precision and recall, to help you determine the predictive performance of your models. These evaluation metrics are generated by comparing the entities returned by the models (the predictions) against the annotations in the test documents.

Create custom extractors for your documents with generative AI

Create custom extractors that are specific to your documents, and are trained and evaluated with your data for higher accuracy and performance. Once you create an initial model, use it to auto-label documents to train a production ready model faster. With a deployed model, extract structured data from documents to automate processes and unearth insights. Use generative AI to quickly improve custom models through prompts. Use this feature to deploy new models or auto-label datasets, saving time and cost. For example, quickly add a new field to your data by prompting a foundation model to add this new field to your data instead of having to label and train a new model. You can also use the same approach to auto label new datasets.

Create custom classifier models for your documents

Create custom classifiers that identify a document type from a set of document classes. Classifying document types help businesses save time, effort, and money. For example, you can validate if users submit the right documents within an application. Also, you can classify documents to automate downstream processes such as choosing the best model to extract data from a document. 

Create custom splitter models for your documents

Create custom splitters to split and classify multiple documents within a single file. Custom splitter helps users sort and classify documents so they can validate if they have all the needed documents from an applicant. For example, a custom splitter can classify and identify the page numbers of a driver’s license, paystub, W-2, and bank statement (and others) within a single document. Furthermore, individually classified documents enable businesses to better automate downstream processes, including selecting the proper storage, analysis, or processing steps based on the document type like data extraction. 

Create document summaries with generative AI

Use document summarizer to create summaries for both short and long documents. You can customize summaries based on your preference. For example, you can decide if you want your summaries to be long or short, paragraphs or bullet points etc. There is no model training required to access this function. However you have the ability to fine tune underlying models to improve the quality of summaries generated. 

BBVA logo
Document AI Workbench is helping us expand document automation more quickly and effectively. By using this new product, we have been able to train our own document parser models in a fraction of the time and with less resources. We feel this will help us realize important operational improvements for our business and help us serve our customers much better.

Daniel Ordaz Palacios, Global Head Business Process & Operations




Create a custom extractor

Learn how to use Document AI Workbench to create and train a custom extractor that processes W-2 (US tax form) documents (as an example). 

Create a custom classifier

Create custom classifiers that identify documents from a user-defined set of classes. 

Create a custom splitter

Custom splitters can split and classify multiple documents within a single file. Your custom splitter is trained and evaluated with your data.

Uptrain a specialized processor

Uptraining means that you begin with a pretrained model, and then train this model with your own data to improve its accuracy. Find out how in this guide. 

Create a labeled dataset

A labeled dataset of documents is required to train, uptrain, or evaluate an ML model. Learn how to create a dataset, import documents, and define a schema.

Label documents

Learn how to apply labels from your model schema to imported documents in your dataset.

Train or uptrain ML models

See how you can train a new custom document processing model from scratch or uptrain an existing ML model for document processing tasks specific to your needs.

Evaluate model performance

An evaluation is automatically run whenever you train or uptrain a model. See how to run a manual evaluation to get updated metrics after modifying the test set.

Not seeing what you’re looking for?

Use cases

Use cases

Use case
Bring your own data to create ML models

Workbench can handle documents with printed or handwritten text, tables, and other nested entities, checkboxes, and more. Workbench can use document images whether they were professionally scanned or captured in a quick photo. You can import data in multiple formats, such as PDFs, common images, and JSON documents.

Use case
Create and evaluate ML models for free

Instead of having to pay to spin up servers and wait while models are trained, you can create and evaluate ML models for free. You simply pay as you go once processors are deployed and used to extract data from documents.

Use case
Train a model

With one click, train a model via uptraining or from scratch. If you are working with a document type similar in layout and schema to an existing document processor, then uptrain the relevant processor to get accurate results faster. If there is no relevant processor available for the document you’re trying to process, then create a model from scratch. 


Pay for what you use

With Document AI Workbench, you pay only for hosting and prediction; there is no cost for importing data or training.


Document AI Workbench partners

Get help with implementing Document AI Workbench with these partners.

  • Deloitte logo
  • TCS logo
  • Quantiphi logo
  • SpringML logo
  • Accenture logo
  • PWC logo
  • Searce logo
  • Pandera logo