Module page (0.2.0a0)

Wrappers for Document AI Page type.

Classes

FormField

FormField(
    documentai_formfield: google.cloud.documentai_v1.types.document.Document.Page.FormField,
    field_name: str,
    field_value: str,
)

Represents a wrapped documentai.Document.Page.FormField.

Line

Line(
    documentai_line: google.cloud.documentai_v1.types.document.Document.Page.Line,
    text: str,
)

Represents a wrapped documentai.Document.Page.Line.

Page

Page(
    documentai_page: google.cloud.documentai_v1.types.document.Document.Page, text: str
)

Represents a wrapped documentai.Document.Page .

Required. A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.

:type: List[str]

Paragraph

Paragraph(
    documentai_paragraph: google.cloud.documentai_v1.types.document.Document.Page.Paragraph,
    text: str,
)

Represents a wrapped documentai.Document.Page.Paragraph.

Table

Table(
    documentai_table: google.cloud.documentai_v1.types.document.Document.Page.Table,
    body_rows: List[List[str]],
    header_rows: List[List[str]],
)

Represents a wrapped documentai.Document.Page.Table.

Modules Functions

_get_form_fields

_get_form_fields(
    form_fields: List[
        google.cloud.documentai_v1.types.document.Document.Page.FormField
    ],
    text: str,
)

Returns a list of FormField.

Parameters
NameDescription
form_fields List[documentai.Document.Page.FormField]

Required. A list of documentai.Document.Page.FormField objects.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[FormField]A list of FormFields.

_get_lines

_get_lines(
    lines: List[google.cloud.documentai_v1.types.document.Document.Page.Line], text: str
)

Returns a list of Line.

Parameters
NameDescription
lines List[documentai.Document.Page.Line]

Required. A list of documentai.Document.Page.Line objects.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[Line]A list of Lines.

_get_paragraphs

_get_paragraphs(
    paragraphs: List[google.cloud.documentai_v1.types.document.Document.Page.Paragraph],
    text: str,
)

Returns a list of Paragraph.

Parameters
NameDescription
paragraphs List[documentai.Document.Page.Paragraph]

Required. A list of documentai.Document.Page.Paragraph objects.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[Paragraph]A list of Paragraphs.

_table_rows_from_documentai_table_rows

_table_rows_from_documentai_table_rows(
    table_rows: List[
        google.cloud.documentai_v1.types.document.Document.Page.Table.TableRow
    ],
    text: str,
)

Returns a list of rows from table_rows.

Parameters
NameDescription
table_rows List[documentai.Document.Page.Table.TableRow]

Required. A documentai.Document.Page.Table.TableRow.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
List[str]A list of table rows.

_table_wrapper_from_documentai_table

_table_wrapper_from_documentai_table(
    documentai_table: google.cloud.documentai_v1.types.document.Document.Page.Table,
    text: str,
)

Returns a Table.

Parameters
NameDescription
documentai_table documentai.Document.Page.Table

Required. A documentai.Document.Page.Table.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
TableA Table.

_text_from_layout

_text_from_layout(
    layout: google.cloud.documentai_v1.types.document.Document.Page.Layout, text: str
)

Returns a text from a single layout element.

Parameters
NameDescription
layout documentai.Document.Page.Layout

Required. an element with layout fields.

text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
strText from a single element.

_trim_text

_trim_text(text: str)

Remove extra space characters from text (blank, newline, tab, etc.)

Parameter
NameDescription
text str

Required. UTF-8 encoded text in reading order from the document.

Returns
TypeDescription
strText without trailing spaces/newlines