Module page (0.14.0a0)

Wrappers for Document AI Page type.

Classes

Block

Block(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Block.

FormField

FormField(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.FormField,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.FormField.

Line

Line(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Line.

MathFormula

MathFormula(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.VisualElement with type math_formula. https://cloud.google.com/document-ai/docs/process-documents-ocr#math_ocr

Page

Page(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page,
    _document_text: str,
)

Represents a wrapped documentai.Document.Page .

Paragraph

Paragraph(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Paragraph.

Symbol

Symbol(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Symbol. https://cloud.google.com/document-ai/docs/process-documents-ocr#enable_symbols

Table

Table(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Table,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Table.

Token

Token(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Token.

_BasePageElement

_BasePageElement(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Base class for representing a wrapped Document AI Page element (Symbol, Token, Line, Paragraph, Block).

Modules Functions

_get_hocr_bounding_box

_get_hocr_bounding_box(
    element_with_layout: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    page_dimension: google.cloud.documentai_v1.types.document.Document.Page.Dimension,
) -> typing.Optional[str]

Returns a hOCR bounding box string.

Parameters
Name Description
element_with_layout ElementWithLayout

Required. an element with layout fields.

dimension documentai.Document.Page.Dimension

Required. Page dimension.

Returns
Type Description
Optional[str] hOCR bounding box string.

_text_from_layout

_text_from_layout(
    layout: google.cloud.documentai_v1.types.document.Document.Page.Layout, text: str
) -> str

Returns a text from a single layout element.

Parameters
Name Description
layout documentai.Document.Page.Layout

Required. An element with layout fields.

text str

Required. UTF-8 encoded text in reading order of the documentai.Document containing the layout element.

Returns
Type Description
str Text from a single element.

_trim_text

_trim_text(text: str) -> str

Remove extra space characters from text (blank, newline, tab, etc.)

Parameter
Name Description
text str

Required. UTF-8 encoded text in reading order from the document.

Returns
Type Description
str Text without trailing spaces/newlines