Document AI Workbench
Extract data and classify documents by creating custom ML models that are specific to your business needs. With Document AI Workbench you can automate business processes by easily training or uptraining machine learning models on custom documents like invoices, bills of lading, tax forms, and more.
Create your own custom model with one click
Uptrain existing models to improve performance
Evaluate, iterate, and choose the best model
Extract structured data and classify document types
New Google Cloud customers get $300 in free credits to fully explore
Achieve higher document processing accuracy
Experience fewer errors and enjoy faster and more accurate document processing workflows with custom models that are trained on your business documents.
Process a wide range of document types
Now you don't need to depend on pretrained models for your document processing needs. You can work with a wide range of documents that are not supported by Google’s pretrained models.
Train models without machine learning skills
With Document AI Workbench even users who do not have extensive machine learning skills can get started training or uptraining models with a friendly user interface.
Uptrain existing models to improve performance
Uptraining means that you begin with a prebuilt ML model, and then train this model with your own data to improve its accuracy for your organization’s documents. Pre-built models offer a base model relevant to your document type and help automatically label data so you can build production ready models faster.
Create Custom Document Extractor models for your documents
Create Custom Document Extractors (CDE) that are specific to your documents, such as tables and checkboxes, and are trained and evaluated with your data for higher accuracy and performance. Once you create an initial model, use it to auto-label documents to train a production ready model faster. With a deployed model, extract structured data from documents to automate processes and unearth insights.
Create Custom Document Classifier models for your documents
Create Custom Document Classifiers (CDC) that identify a document type from a user-defined set of classes. Classifying document types help businesses save time, effort, and money. For example, you can validate if users submit the right documents within an application. Also, you can classify documents to automate downstream processes such as choosing the best model to extract data from a document.
Manage and label datasets to prepare for training
A labeled dataset of documents is required to train, uptrain, or evaluate an ML model. With Document AI Workbench you can apply labels from your model schema to imported documents in your dataset. If available, you can use an existing version of your model to get a head start via automatic labeling. You can also outsource and manage the labeling of your documents to a team of labeling specialists in your organization or a third party.
Train, evaluate, and deploy models to production
Easily train and deploy your custom ML models for document processing. Document AI generates evaluation metrics, such as precision and recall, to help you determine the predictive performance of your models. These evaluation metrics are generated by comparing the entities returned by the models (the predictions) against the annotations in the test documents.
"Document AI Workbench is helping us expand document automation more quickly and effectively. By using this new product, we have been able to train our own document parser models in a fraction of the time and with less resources. We feel this will help us realize important operational improvements for our business and help us serve our customers much better."
Daniel Ordaz Palacios, Global Head Business Process & Operations
Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.
Create a Custom Document Extractor
Learn how to use Document AI Workbench to create and train a Custom Document Extractor that processes W-2 (US tax form) documents (as an example).
Create Custom Document Classifier
Create Custom Document Classifiers that identify documents from a user-defined set of classes.
Uptrain a specialized processor
Uptraining means that you begin with a pretrained model, and then train this model with your own data to improve its accuracy. Find out how in this guide.
Create a labeled dataset
A labeled dataset of documents is required to train, uptrain, or evaluate an ML model. Learn how to create a dataset, import documents, and define a schema.
Learn how to apply labels from your model schema to imported documents in your dataset.
Train or uptrain ML models
See how you can train a new custom document processing model from scratch or uptrain an existing ML model for document processing tasks specific to your needs.
Evaluate model performance
An evaluation is automatically run whenever you train or uptrain a model. See how to run a manual evaluation to get updated metrics after modifying the test set.
Workbench can handle documents with printed or handwritten text, tables, and other nested entities, checkboxes, and more. Workbench can use document images whether they were professionally scanned or captured in a quick photo. You can import data in multiple formats, such as PDFs, common images, and JSON documents.
Instead of having to pay to spin up servers and wait while models are trained, you can create and evaluate ML models for free. You simply pay as you go once processors are deployed and used to extract data from documents.
With one click, train a model via uptraining or from scratch. If you are working with a document type similar in layout and schema to an existing document processor, then uptrain the relevant processor to get accurate results faster. If there is no relevant processor available for the document you’re trying to process, then create a model from scratch.
Pay for what you use
With Document AI Workbench, you pay only for hosting and
prediction; there is no cost for importing data or training.
With Document AI Workbench, you pay only for hosting and prediction; there is no cost for importing data or training.