Optical Character
Recognition (OCR) is a foundational technology behind the
conversion of typed, handwritten or printed text from images
into machine-encoded text.
What types of OCR does
Google Cloud offer?
Google Cloud offers two types of OCR: OCR for documents
and OCR for images and videos.
While they share a foundational technology,
Document AI
is a document understanding platform optimized for
document processing; and
Cloud Vision,
on the other hand, is commonly used to detect text,
handwriting and a wide range of objects from images and
videos.
You can also use
other Google Cloud products
to perform OCR, for more advanced and specific
functionality beyond those offered by Document AI and
Cloud Vision.
How does OCR work at
Google Cloud?
Google Cloud powers OCR with best-in-class AI. It goes
beyond traditional text recognition by understanding,
organizing and enriching data, ultimately generating
business-ready insights.
It gives you the flexibility to either use the OCR tools
as a unified suite for streamlined efficiency (e.g.
Document AI),
or simply call the relevant
APIs directly
available in Google Cloud console to integrate OCR
functionalities into your applications.
All the OCR solutions mentioned above give you access to
pre-trained ML models that you can deploy right away
through an API, or uptrain to improve accuracy for your
specific needs.
You can also train your own custom models with AutoML -
no machine learning expertise needed.
If you are looking to analyze a document, or build an
automated document processing pipeline, use
Document AI
- it takes care of the entire workflow all in one place,
from understanding documents to search, store, govern and
manage the documents alongside extracted data.
If you want to analyze and process images, use
Cloud Vision
alongside other Google Cloud products for best results -
check the Common Uses section for details and quickstart
guides.
To understand and process documents, use Document
AI.
For images, we recommend using Cloud
Vision.
Both give you access to pre-trained ML models that
you can deploy as-is through APIs or uptrain. You
can also train your own custom models from scratch
with AutoML - no ML expertise needed.
First 1000 units every month are free when you use
Cloud Vision or Document OCR - try it with a simple
API call.
Unlock insights
from nuanced documents with Document AI
Through an API, Google Cloud’s Document AI
offers prebuilt processors optimized
for documents, both generic and
industry-specific, supporting text recognition
and extraction in different languages.
To run a basic document processing pipeline
(image on the right), your monthly cost
would be $71.87.
You can check the usage assumptions made to
arrive at this number in the
pricing calculator.
This pipeline consists of a primary Cloud
Function that processes new files that are
uploaded to Cloud Storage using a Document AI
form processor and then saves form data
detected in those files to BigQuery.
Mr. Cooper uses
Google AI to speed up mortgage processing
Mr. Cooper is one of the largest home loan
servicers in the country focused on delivering
a variety of servicing and lending products,
services and technologies to homeowners.
They built a container-based document
processing pipeline with a modular
architecture on Google’s OCR technology stack
and achieved these results:
- Over 95% accuracy for critical
documents.
- Peak throughput of 4000 pages/min,
an average throughput of 2000 pages/min.
- Increased document processing efficiency
by 400%.
Use Cloud
Vision API and AutoML to tag and process images
Image tagging is also referred to as image
labeling.
Cloud Vision API
can identify and label general objects,
landmarks, locations, logos, activities,
animal species, products, and more in an
image. Once the images are tagged with the
detected labels, image search, processing and
management are automated and easier.
If you need targeted custom
labels, use
Cloud AutoML
to train a custom ML model.
AutoML helps
scientists predict and track shoreline changes
Using Cloud AutoML, researchers at
Texas A&M University were able to custom
train a multi-labeled dataset model with
10,458 shoreline images in
24 compute hours. The model helped the
researchers predict and track shoreline
changes with an
average accuracy of 95.2%.
AutoML provided the added flexibility in
training advanced models using the training
images, letting the team inspect the data and
analyze the results via an intuitive UI, and
providing an API for scalable serving.
Through
Cloud Vision
API, you can detect and extract text and
handwriting from any images in
different languages.
It also has
multi-region support for
which you can specify continent-level data
storage and OCR processing.
You can choose to get immediate results for a
small number of images (up to 16 per request),
or
batch process
a larger number of images (up to 2000 per
request) asynchronously for a result later.