Module page (0.13.0a0)

Wrappers for Document AI Page type.

Classes

Block

Block(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Block.

FormField

FormField(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.FormField,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.FormField.

Line

Line(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Line.

MathFormula

MathFormula(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.VisualElement with type math_formula. https://cloud.google.com/document-ai/docs/process-documents-ocr#math_ocr

Page

Page(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page,
    _document_text: str,
)

Represents a wrapped documentai.Document.Page .

Paragraph

Paragraph(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Paragraph.

Symbol

Symbol(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Symbol. https://cloud.google.com/document-ai/docs/process-documents-ocr#enable_symbols

Table

Table(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Table,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Table.

Token

Token(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Token.

_BasePageElement

_BasePageElement(
    documentai_object: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Base class for representing a wrapped Document AI Page element (Symbol, Token, Line, Paragraph, Block).

Modules Functions

_get_children_of_element

_get_children_of_element(
    element: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    children: typing.List[
        typing.Union[
            google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
            google.cloud.documentai_v1.types.document.Document.Page,
            google.cloud.documentai_v1.types.document.Document.Page.Token,
            google.cloud.documentai_v1.types.document.Document.Page.Block,
            google.cloud.documentai_v1.types.document.Document.Page.Symbol,
        ]
    ],
) -> typing.List[
    typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ]
]

Returns a list of children inside element.

Parameters
NameDescription
element ElementWithLayout

Required. A element in a page.

children List[ElementWithLayout]

Required. List of wrapped children.

Returns
TypeDescription
List[ElementWithLayout]A list of wrapped children that are inside a element.

_get_hocr_bounding_box

_get_hocr_bounding_box(
    element_with_layout: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    page_dimension: google.cloud.documentai_v1.types.document.Document.Page.Dimension,
) -> typing.Optional[str]

Returns a hOCR bounding box string.

Parameters
NameDescription
element_with_layout ElementWithLayout

Required. an element with layout fields.

dimension documentai.Document.Page.Dimension

Required. Page dimension.

Returns
TypeDescription
Optional[str]hOCR bounding box sring.

_table_rows_from_documentai_table_rows

_table_rows_from_documentai_table_rows(
    table_rows: typing.List[
        google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow
    ],
    text: str,
) -> typing.List[typing.List[str]]

Returns a list of rows from table_rows.

Parameters
NameDescription
table_rows List[documentai.Document.Page.Table.TableRow]

Required. A documentai.Document.Page.Table.TableRow.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[List[str]]A list of table rows.

_text_from_layout

_text_from_layout(
    layout: google.cloud.documentai_v1.types.document.Document.Page.Layout, text: str
) -> str

Returns a text from a single layout element.

Parameters
NameDescription
layout documentai.Document.Page.Layout

Required. an element with layout fields.

text str

Required. UTF-8 encoded text in reading order of the documentai.Document containing the layout element.

Returns
TypeDescription
strText from a single element.

_trim_text

_trim_text(text: str) -> str

Remove extra space characters from text (blank, newline, tab, etc.)

Parameter
NameDescription
text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
strText without trailing spaces/newlines