Class Page (0.3.0)

Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A page in a Document.

Attributes

NameDescription
page_number int
1-based index for current Page in a parent Document. Useful when a page is taken out of a Document for individual processing.
image .document.Document.Page.Image
Rendered image for this page. This image is preprocessed to remove any skew, rotation, and distortions such that the annotation bounding boxes can be upright and axis-aligned.
transforms Sequence[.document.Document.Page.Matrix]
Transformation matrices that were applied to the original document image to produce Page.image.
dimension .document.Document.Page.Dimension
Physical dimension of the page.
layout .document.Document.Page.Layout
Layout for the page.
detected_languages Sequence[.document.Document.Page.DetectedLanguage]
A list of detected languages together with confidence.
blocks Sequence[.document.Document.Page.Block]
A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.
paragraphs Sequence[.document.Document.Page.Paragraph]
A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.
lines Sequence[.document.Document.Page.Line]
A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.
tokens Sequence[.document.Document.Page.Token]
A list of visually detected tokens on the page.
visual_elements Sequence[.document.Document.Page.VisualElement]
A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.
tables Sequence[.document.Document.Page.Table]
A list of visually detected tables on the page.
form_fields Sequence[.document.Document.Page.FormField]
A list of visually detected form fields on the page.

Classes

Block

Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

DetectedLanguage

DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected language for a structural component.

Dimension

Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Dimension for the page.

FormField

FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A form field detected on the page.

Image

Image(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Rendered image contents for this page.

Layout

Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Visual element describing a layout unit on a page.

Line

Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

Matrix

Matrix(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Representation for transformation matrix, intended to be compatible and used with OpenCV format for image manipulation.

Paragraph

Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of lines that a human would perceive as a paragraph.

Table

Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A table representation similar to HTML table structure.

Token

Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A detected token.

VisualElement

VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected non-text visual elements e.g. checkbox, signature etc. on the page.