Class Page (2.5.0)

Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A page in a Document.

Attributes

NameDescription
page_number int
1-based index for current Page in a parent Document. Useful when a page is taken out of a Document for individual processing.
image google.cloud.documentai_v1.types.Document.Page.Image
Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned.
transforms MutableSequence[google.cloud.documentai_v1.types.Document.Page.Matrix]
Transformation matrices that were applied to the original document image to produce Page.image.
dimension google.cloud.documentai_v1.types.Document.Page.Dimension
Physical dimension of the page.
layout google.cloud.documentai_v1.types.Document.Page.Layout
Layout for the page.
detected_languages MutableSequence[google.cloud.documentai_v1.types.Document.Page.DetectedLanguage]
A list of detected languages together with confidence.
blocks MutableSequence[google.cloud.documentai_v1.types.Document.Page.Block]
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
paragraphs MutableSequence[google.cloud.documentai_v1.types.Document.Page.Paragraph]
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
lines MutableSequence[google.cloud.documentai_v1.types.Document.Page.Line]
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
tokens MutableSequence[google.cloud.documentai_v1.types.Document.Page.Token]
A list of visually detected tokens on the page.
visual_elements MutableSequence[google.cloud.documentai_v1.types.Document.Page.VisualElement]
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
tables MutableSequence[google.cloud.documentai_v1.types.Document.Page.Table]
A list of visually detected tables on the page.
form_fields MutableSequence[google.cloud.documentai_v1.types.Document.Page.FormField]
A list of visually detected form fields on the page.
symbols MutableSequence[google.cloud.documentai_v1.types.Document.Page.Symbol]
A list of visually detected symbols on the page.
detected_barcodes MutableSequence[google.cloud.documentai_v1.types.Document.Page.DetectedBarcode]
A list of detected barcodes.
image_quality_scores google.cloud.documentai_v1.types.Document.Page.ImageQualityScores
Image Quality Scores.
provenance google.cloud.documentai_v1.types.Document.Provenance
The history of this page.

Classes

Block

Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

DetectedBarcode

DetectedBarcode(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A detected barcode.

DetectedLanguage

DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected language for a structural component.

Dimension

Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Dimension for the page.

FormField

FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A form field detected on the page.

Image

Image(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Rendered image contents for this page.

ImageQualityScores

ImageQualityScores(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Image Quality Scores for the page image

Layout

Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Visual element describing a layout unit on a page.

Line

Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Matrix

Matrix(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

Paragraph

Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of lines that a human would perceive as a paragraph.

Symbol

Symbol(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A detected symbol.

Table

Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A table representation similar to HTML table structure.

Token

Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A detected token.

VisualElement

VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected non-text visual elements e.g. checkbox, signature etc. on the page.