OcrConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Config for Document OCR.
Attributes | |
---|---|
Name | Description |
hints |
google.cloud.documentai_v1.types.OcrConfig.Hints
Hints for the OCR model. |
enable_native_pdf_parsing |
bool
Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs. |
enable_image_quality_scores |
bool
Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call. |
advanced_ocr_options |
MutableSequence[str]
A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are: - legacy_layout : a heuristics layout detection
algorithm, which serves as an alternative to the current
ML-based layout detection algorithm. Customers can choose
the best suitable layout algorithm based on their
situation.
|
enable_symbol |
bool
Includes symbol level OCR information if set to true. |
compute_style_info |
bool
Turn on font identification model and return font style information. Deprecated, use PremiumFeatures.compute_style_info instead. |
disable_character_boxes_detection |
bool
Turn off character box detector in OCR engine. Character box detection is enabled by default in OCR 2.0 (and later) processors. |
premium_features |
google.cloud.documentai_v1.types.OcrConfig.PremiumFeatures
Configurations for premium OCR features. |
Classes
Hints
Hints(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Hints for OCR Engine
PremiumFeatures
PremiumFeatures(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Configurations for premium OCR features.