Module page (0.10.0a0)

Wrappers for Document AI Page type.

Classes

Block

Block(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Block,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Block.

FormField

FormField(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.FormField,
    document_text: dataclasses.InitVar[str],
)

Represents a wrapped documentai.Document.Page.FormField.

Line

Line(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Line,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Line.

Page

Page(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page,
    document_text: str,
)

Represents a wrapped documentai.Document.Page .

Paragraph

Paragraph(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page.Paragraph.

Table

Table(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Table,
    document_text: dataclasses.InitVar[str],
)

Represents a wrapped documentai.Document.Page.Table.

Token

Token(
    documentai_object: google.cloud.documentai_v1.types.document.Document.Page.Token,
    _page: google.cloud.documentai_toolbox.wrappers.page.Page,
)

Represents a wrapped documentai.Document.Page. .

Modules Functions

_get_children_of_element

_get_children_of_element(
    element: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    children: typing.Union[
        typing.List[google.cloud.documentai_toolbox.wrappers.page.Paragraph],
        typing.List[google.cloud.documentai_toolbox.wrappers.page.Block],
        typing.List[google.cloud.documentai_toolbox.wrappers.page.Token],
        typing.List[google.cloud.documentai_toolbox.wrappers.page.Line],
    ],
) -> typing.List[
    typing.Union[
        google.cloud.documentai_toolbox.wrappers.page.Block,
        google.cloud.documentai_toolbox.wrappers.page.Paragraph,
        google.cloud.documentai_toolbox.wrappers.page.Line,
        google.cloud.documentai_toolbox.wrappers.page.Token,
    ]
]

Returns a list of children inside element.

Parameters
NameDescription
element ElementWithLayout

Required. A element in a page.

children ChildrenElements

Required. List of wrapped children.

Returns
TypeDescription
List[Union[Block, Paragraph, Line, Token]]A list of wrapped children that are inside a element.

_get_hocr_bounding_box

_get_hocr_bounding_box(
    element_with_layout: typing.Union[
        google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
        google.cloud.documentai_v1.types.document.Document.Page,
        google.cloud.documentai_v1.types.document.Document.Page.Token,
        google.cloud.documentai_v1.types.document.Document.Page.Block,
        google.cloud.documentai_v1.types.document.Document.Page.Symbol,
    ],
    page_dimension: google.cloud.documentai_v1.types.document.Document.Page.Dimension,
) -> str

Returns a hOCR bounding box string.

Parameters
NameDescription
element_with_layout ElementWithLayout

Required. an element with layout fields.

dimension documentai.Document.Page.Dimension

Required. Page dimension.

Returns
TypeDescription
strhOCR bounding box sring.

_table_rows_from_documentai_table_rows

_table_rows_from_documentai_table_rows(
    table_rows: typing.List[
        google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow
    ],
    text: str,
) -> typing.List[typing.List[str]]

Returns a list of rows from table_rows.

Parameters
NameDescription
table_rows List[documentai.Document.Page.Table.TableRow]

Required. A documentai.Document.Page.Table.TableRow.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[List[str]]A list of table rows.

_text_from_layout

_text_from_layout(
    layout: google.cloud.documentai_v1.types.document.Document.Page.Layout, text: str
) -> str

Returns a text from a single layout element.

Parameters
NameDescription
layout documentai.Document.Page.Layout

Required. an element with layout fields.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
strText from a single element.

_trim_text

_trim_text(text: str) -> str

Remove extra space characters from text (blank, newline, tab, etc.)

Parameter
NameDescription
text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
strText without trailing spaces/newlines