Enterprise Document OCR

You can use Enterprise Document OCR as part of Document AI to detect and extract text and layout information from various documents. With configurable features, you can tailor the system to meet specific document-processing requirements.

Uses

You can use Enterprise Document OCR for tasks such as data entry based on algorithms or machine learning and improving and verifying data accuracy. You can also use Enterprise Document OCR to handle tasks like the following:

  • Digitizing text: Extract text and layout data from documents for search, rules-based, document-processing pipelines, or custom-model creation.
  • Using large language model applications: Use LLMs' contextual understanding and OCR's text and layout extraction capabilities to automate questions and answers. Unlock insights from data, and streamline workflows.
  • Archiving: Digitize paper documents into machine-readable text to improve document accessibility.

Choosing the best OCR for your use case

Solution Product Description Use case
Document AI Enterprise Document OCR Specialized model for document use cases. Advanced features include image-quality score, language hints, and rotation correction. Recommended when extracting text from documents. Use cases include PDFs, scanned documents as images, or Microsoft DocX files.
Document AI OCR add ons Premium features for specific requirements. Only compatible with Enterprise Document OCR version 2.0 and later. Need to detect and recognize math formulas, receive font-style information, or enable checkbox extraction.
Cloud Vision API Text detection Globally available REST API based on Google Cloud standard OCR model. Default quota of 1,800 requests per minute. General text-extraction use cases that require low latency and high capacity.
Cloud Vision OCR Google Distributed Cloud Virtual Google Cloud Marketplace application that can be deployed as a container to any GKE cluster using GKE Enterprise. To meet data residency or compliance requirements.

Detection and extraction

Enterprise Document OCR can detect blocks, paragraphs, lines, words, and symbols from PDFs and images, as well as deskew documents for better accuracy.

Supported layout detection and extraction attributes:

Printed text Handwriting Paragraph Block Line Word Symbol-level Page number
Default Default Default Default Default Default Configurable Default

Configurable Enterprise Document OCR features include the following:

  • Extract embedded or native text from digital PDFs: This feature extracts text and symbols exactly as they appear in the source documents, even for rotated texts, extreme font sizes or styles, and partially hidden text.

  • Rotation correction: Use Enterprise Document OCR to preprocess document images to correct rotation issues that can affect extraction quality or processing.

  • Image-quality score: Receive quality metrics that can help with document routing. Image-quality score provides you with page-level quality metrics in eight dimensions, including blurriness, the presence of smaller-than-usual fonts, and glare.

  • Specify page range: Specifies the range of the pages in an input document for OCR. This saves the spending and processing time over unneeded pages.

  • Language detection: Detects the languages used in the extracted texts.

  • Language and handwriting hints: Improve accuracy by providing the OCR model a language or handwriting hint based on the known characteristics of your dataset.

To learn how to enable OCR configurations, see Enable OCR configurations.

OCR add ons

Enterprise Document OCR offers optional analysis capabilities which can be enabled on individual processing requests as needed.

The following add-on capabilities are available for the pretrained-ocr-v2.0-2023-06-02 version:

  • Math OCR: Identify and extract formulas from documents in LaTeX format.
  • Checkbox extraction: Detect checkboxes and extract their status (marked/unmarked) in Enterprise Document OCR response.
  • Font style detection: Identify word-level font properties including font type, font style, handwriting, weight, and color.

To learn how to enable the listed add-ons, see Enable OCR add ons.

Languages and regions

Covering over 200 languages, Enterprise Document OCR identifies and extracts text from documents in more than 50 handwritten languages. For more information, see Language support.

We provide support for this feature in eight regions. For details, visit Regional and multi-regional support.

Supported file formats

Enterprise Document OCR supports PDF, GIF, TIFF, JPEG, PNG, BMP, and WebP file formats. For more information, see Supported files.

Enterprise Document OCR also supports DocX files up to 15 pages in sync and 30 pages in async. DocX support is in private preview. To request access, submit the DocX Support Request Form.

Advanced versioning

Advanced versioning is in Preview. Upgrades to the underlying AI/ML OCR models might lead to changes in OCR behavior. If strict consistency is required, use a frozen model version to pin behavior to a legacy OCR model for up to 18 months. This ensures the same image to OCR function result. See the table about processor versions.

Processor versions

The following processor versions are compatible with this feature. For more information, see Managing processor versions.

Version ID Release channel Description
pretrained-ocr-v1.0-2020-09-23 Stable Production ready OCR engine. Includes access to all features.
pretrained-ocr-v1.1-2022-09-12 Release Candidate Not recommended for use. Version to be deprecated.
pretrained-ocr-v1.2-2022-11-10 Release Candidate Frozen model version of v1.0: Model files, configurations, and binaries of a version snapshot frozen in a container image for up to 18 months. Public Preview.
pretrained-ocr-v2.0-2023-06-02 Release Candidate Upgraded model specialized for document use cases. Includes access to all configurable features. Compatible with OCR add ons.

Next steps