Class OcrConfig (2.22.0)

OcrConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Config for Document OCR.

Attributes

NameDescription
hints google.cloud.documentai_v1beta3.types.OcrConfig.Hints
Hints for the OCR model.
enable_native_pdf_parsing bool
Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.
enable_image_quality_scores bool
Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.
advanced_ocr_options MutableSequence[str]
A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are: - legacy_layout: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.
enable_symbol bool
Includes symbol level OCR information if set to true.
compute_style_info bool
Turn on font identification model and return font style information. Deprecated, use PremiumFeatures.compute_style_info instead.
disable_character_boxes_detection bool
Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors.
premium_features google.cloud.documentai_v1beta3.types.OcrConfig.PremiumFeatures
Configurations for premium OCR features.

Classes

Hints

Hints(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Hints for OCR Engine

PremiumFeatures

PremiumFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Configurations for premium OCR features.