Class OcrConfig (2.17.0)

OcrConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Config for Document OCR.

Attributes

NameDescription
hints google.cloud.documentai_v1.types.OcrConfig.Hints
Hints for the OCR model.
enable_native_pdf_parsing bool
Enables special handling for PDFs with existing text information. Results in better text extraction quality in such PDF inputs.
enable_image_quality_scores bool
Enables intelligent document quality scores after OCR. Can help with diagnosing why OCR responses are of poor quality for a given input. Adds additional latency comparable to regular OCR to the process call.
advanced_ocr_options MutableSequence[str]
A list of advanced OCR options to further fine-tune OCR behavior. Current valid values are: - legacy_layout: a heuristics layout detection algorithm, which serves as an alternative to the current ML-based layout detection algorithm. Customers can choose the best suitable layout algorithm based on their situation.
enable_symbol bool
Includes symbol level OCR information if set to true.
compute_style_info bool
Turn on font id model and returns font style information.

Classes

Hints

Hints(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Hints for OCR Engine