Client for Google Cloud Documentai API

class google.cloud.documentai_v1beta2.AutoMlParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Parameters to control AutoML model prediction behavior.

model()

Resource name of the AutoML model.

Format: projects/{project-id}/locations/{location-id}/models/{model-id}.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.BatchProcessDocumentsRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.

requests()

Required. Individual requests for each document.

  • Type

    Sequence[ProcessDocumentRequest]

parent()

Target project and location to make a call.

Format: projects/{project-id}/locations/{location-id}.

If no location is specified, a region will be chosen automatically.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.BatchProcessDocumentsResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

responses()

Responses for each individual document.

  • Type

    Sequence[ProcessDocumentResponse]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.BoundingPoly(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A bounding polygon for the detected image annotation.

vertices()

The bounding polygon vertices.

  • Type

    Sequence[Vertex]

normalized_vertices()

The bounding polygon normalized vertices.

  • Type

    Sequence[NormalizedVertex]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

uri()

Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name. Object versioning is not supported. See Google Cloud Storage Request URIs for more info.

content()

Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

mime_type()

An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media- types/media-types.xhtml.

text()

UTF-8 encoded text in reading order from the document.

text_styles()

Styles for the [Document.text][google.cloud.documentai.v1beta2.Document.text].

  • Type

    Sequence[Style]

pages()

Visual page layout for the [Document][google.cloud.documentai.v1beta2.Document].

  • Type

    Sequence[Page]

entities()

A list of entities detected on [Document.text][google.cloud.documentai.v1beta2.Document.text]. For document shards, entities in this list may cross shard boundaries.

  • Type

    Sequence[Entity]

entity_relations()

Relationship among [Document.entities][google.cloud.documentai.v1beta2.Document.entities].

  • Type

    Sequence[EntityRelation]

shard_info()

Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.

  • Type

    ShardInfo

labels()

[Label][google.cloud.documentai.v1beta2.Document.Label]s for this document.

  • Type

    Sequence[Label]

error()

Any error that occurred while processing this document.

  • Type

    Status

class Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A phrase in the text that is a known entity type, such as a person, an organization, or location.

text_anchor()

Provenance of the entity. Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].

  • Type

    TextAnchor

type()

Entity type from a schema e.g. Address.

mention_text()

Text value in the document e.g. 1600 Amphitheatre Pkwy.

mention_id()

Deprecated. Use id field instead.

confidence()

Optional. Confidence of detected Schema entity. Range [0, 1].

page_anchor()

Optional. Represents the provenance of this entity wrt. the location on the page where it was found.

  • Type

    PageAnchor

id()

Optional. Canonical id. This will be a unique value in the entity list for this document.

bounding_poly_for_demo_frontend()

Optional. Temporary field to store the bounding poly for short-term POCs. Used by the frontend only. Do not use before you talk to ybo@ and lukasr@.

  • Type

    BoundingPoly

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Relationship between [Entities][google.cloud.documentai.v1beta2.Document.Entity].

subject_id()

Subject entity id.

object_id()

Object entity id.

relation()

Relationship description.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Label(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Label attaches schema information and/or other metadata to segments within a [Document][google.cloud.documentai.v1beta2.Document]. Multiple [Label][google.cloud.documentai.v1beta2.Document.Label]s on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.

automl_model()

Label is generated AutoML model. This field stores the full resource name of the AutoML model.

Format: projects/{project-id}/locations/{location-id}/models/{model-id}

name()

Name of the label. When the label is generated from AutoML Text Classification model, this field represents the name of the category.

confidence()

Confidence score between 0 and 1 for label assignment.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A page in a [Document][google.cloud.documentai.v1beta2.Document].

page_number()

1-based index for current [Page][google.cloud.documentai.v1beta2.Document.Page] in a parent [Document][google.cloud.documentai.v1beta2.Document]. Useful when a page is taken out of a [Document][google.cloud.documentai.v1beta2.Document] for individual processing.

dimension()

Physical dimension of the page.

  • Type

    Dimension

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for the page.

  • Type

    Layout

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

blocks()

A list of visually detected text blocks on the page. A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

  • Type

    Sequence[Block]

paragraphs()

A list of visually detected text paragraphs on the page. A collection of lines that a human would perceive as a paragraph.

  • Type

    Sequence[Paragraph]

lines()

A list of visually detected text lines on the page. A collection of tokens that a human would perceive as a line.

  • Type

    Sequence[Line]

tokens()

A list of visually detected tokens on the page.

  • Type

    Sequence[Token]

visual_elements()

A list of detected non-text visual elements e.g. checkbox, signature etc. on the page.

  • Type

    Sequence[VisualElement]

tables()

A list of visually detected tables on the page.

  • Type

    Sequence[Table]

form_fields()

A list of visually detected form fields on the page.

  • Type

    Sequence[FormField]

class Block(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A block has a set of lines (collected into paragraphs) that have a common line-spacing and orientation.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Block][google.cloud.documentai.v1beta2.Document.Page.Block].

  • Type

    Layout

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class DetectedLanguage(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected language for a structural component.

language_code()

The BCP-47 language code, such as “en-US” or “sr-Latn”. For more information, see http://www.unicode.org/reports/tr35/#Unicode_locale_identifier.

confidence()

Confidence of detected language. Range [0, 1].

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Dimension(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Dimension for the page.

width()

Page width.

height()

Page height.

unit()

Dimension unit.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class FormField(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A form field detected on the page.

field_name()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta2.Document.Page.FormField] name. e.g. Address, Email, Grand total, Phone number, etc.

  • Type

    Layout

field_value()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for the [FormField][google.cloud.documentai.v1beta2.Document.Page.FormField] value.

  • Type

    Layout

name_detected_languages()

A list of detected languages for name together with confidence.

  • Type

    Sequence[DetectedLanguage]

value_detected_languages()

A list of detected languages for value together with confidence.

  • Type

    Sequence[DetectedLanguage]

value_type()

If the value is non-textual, this field represents the type. Current valid values are:

  • blank (this indicates the field_value is normal text)

  • “unfilled_checkbox”

  • “filled_checkbox”.

  • Type

    str

corrected_key_text()

An internal field, created for Labeling UI to export key text.

corrected_value_text()

An internal field, created for Labeling UI to export value text.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Layout(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Visual element describing a layout unit on a page.

text_anchor()

Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].

  • Type

    TextAnchor

confidence()

Confidence of the current [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] within context of the object this layout is for. e.g. confidence can be for a single token, a table, a visual element, etc. depending on context. Range [0, 1].

bounding_poly()

The bounding polygon for the [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout].

  • Type

    BoundingPoly

orientation()

Detected orientation for the [Layout][google.cloud.documentai.v1beta2.Document.Page.Layout].

  • Type

    Orientation

id()

Optional. This is the identifier used by referencing [PageAnchor][google.cloud.documentai.v1beta2.Document.PageAnchor]s.

class Orientation(value)

Detected human reading orientation.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Line(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of tokens that a human would perceive as a line. Does not cross column boundaries, can be horizontal, vertical, etc.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Line][google.cloud.documentai.v1beta2.Document.Page.Line].

  • Type

    Layout

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Paragraph(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A collection of lines that a human would perceive as a paragraph.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Paragraph][google.cloud.documentai.v1beta2.Document.Page.Paragraph].

  • Type

    Layout

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Table(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A table representation similar to HTML table structure.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Table][google.cloud.documentai.v1beta2.Document.Page.Table].

  • Type

    Layout

header_rows()

Header rows of the table.

  • Type

    Sequence[TableRow]

body_rows()

Body rows of the table.

  • Type

    Sequence[TableRow]

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

class TableCell(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A cell representation inside the table.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [TableCell][google.cloud.documentai.v1beta2.Document.Page.Table.TableCell].

  • Type

    Layout

row_span()

How many rows this cell spans.

col_span()

How many columns this cell spans.

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class TableRow(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A row of table cells.

cells()

Cells that make up this row.

  • Type

    Sequence[TableCell]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Token(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A detected token.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [Token][google.cloud.documentai.v1beta2.Document.Page.Token].

  • Type

    Layout

detected_break()

Detected break at the end of a [Token][google.cloud.documentai.v1beta2.Document.Page.Token].

  • Type

    DetectedBreak

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

class DetectedBreak(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected break at the end of a [Token][google.cloud.documentai.v1beta2.Document.Page.Token].

type()

Detected break type.

  • Type

    Type

class Type(value)

Enum to denote the type of break found.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class VisualElement(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Detected non-text visual elements e.g. checkbox, signature etc. on the page.

layout()

[Layout][google.cloud.documentai.v1beta2.Document.Page.Layout] for [VisualElement][google.cloud.documentai.v1beta2.Document.Page.VisualElement].

  • Type

    Layout

type()

Type of the [VisualElement][google.cloud.documentai.v1beta2.Document.Page.VisualElement].

detected_languages()

A list of detected languages together with confidence.

  • Type

    Sequence[DetectedLanguage]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Referencing elements in [Document.pages][google.cloud.documentai.v1beta2.Document.pages].

page_refs()

One or more references to visual page elements

  • Type

    Sequence[PageRef]

class PageRef(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Represents a weak reference to a page element within a document.

page()

Required. Index into the [Document.pages][google.cloud.documentai.v1beta2.Document.pages] element

layout_type()

Optional. The type of the layout element that is being referenced. If not specified the whole page is assumed to be referenced.

  • Type

    LayoutType

layout_id()

Optional. The [Page.Layout.id][google.cloud.documentai.v1beta2.Document.Page.Layout.id] on the page that this element references. If [LayoutRef.type][] is specified this id must also be specified.

class LayoutType(value)

The type of layout that is being referenced.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

shard_index()

The 0-based index of this shard.

shard_count()

Total number of shards.

text_offset()

The index of the first character in [Document.text][google.cloud.documentai.v1beta2.Document.text] in the overall document global text.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

text_anchor()

Text anchor indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].

  • Type

    TextAnchor

color()

Text color.

  • Type

    Color

background_color()

Text background color.

  • Type

    Color

font_weight()

Font weight. Possible values are normal, bold, bolder, and lighter. https://www.w3schools.com/cssref/pr_font_weight.asp

text_style()

Text style. Possible values are normal, italic, and oblique. https://www.w3schools.com/cssref/pr_font_font-style.asp

text_decoration()

Text decoration. Follows CSS standard. https://www.w3schools.com/cssref/pr_text_text-decoration.asp

font_size()

Font size.

  • Type

    FontSize

class FontSize(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Font size with unit.

size()

Font size for the text.

unit()

Unit for the font size. Follows CSS naming (in, px, pt, etc.).

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Text reference indexing into the [Document.text][google.cloud.documentai.v1beta2.Document.text].

text_segments()

The text segments from the [Document.text][google.cloud.documentai.v1beta2.Document.text].

  • Type

    Sequence[TextSegment]

class TextSegment(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A text segment in the [Document.text][google.cloud.documentai.v1beta2.Document.text]. The indices may be out of bounds which indicate that the text extends into another document shard for large sharded documents. See [ShardInfo.text_offset][google.cloud.documentai.v1beta2.Document.ShardInfo.text_offset]

start_index()

[TextSegment][google.cloud.documentai.v1beta2.Document.TextAnchor.TextSegment] start UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta2.Document.text].

end_index()

[TextSegment][google.cloud.documentai.v1beta2.Document.TextAnchor.TextSegment] half open end UTF-8 char index in the [Document.text][google.cloud.documentai.v1beta2.Document.text].

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.DocumentUnderstandingServiceClient(*, credentials: Optional[google.auth.credentials.Credentials] = None, transport: Optional[Union[str, google.cloud.documentai_v1beta2.services.document_understanding_service.transports.base.DocumentUnderstandingServiceTransport]] = None, client_options: <module 'google.api_core.client_options' from '/workspace/python-documentai/.nox/docfx/lib/python3.9/site-packages/google/api_core/client_options.py'> = ClientOptions: {'api_endpoint': 'us-documentai.googleapis.com', 'client_cert_source': None, 'client_encrypted_cert_source': None, 'quota_project_id': None, 'credentials_file': None, 'scopes': None, 'api_key': None, 'api_audience': None})

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, and translation.

Instantiate the document understanding service client.

  • Parameters

    • credentials (Optional[google.auth.credentials.Credentials]) – The authorization credentials to attach to requests. These credentials identify the application to the service; if none are specified, the client will attempt to ascertain the credentials from the environment.

    • transport (Union[str, **DocumentUnderstandingServiceTransport]) – The transport to use. If set to None, a transport is chosen automatically.

    • client_options (ClientOptions) – Custom options for the client.

batch_process_documents(request: Optional[google.cloud.documentai_v1beta2.types.document_understanding.BatchProcessDocumentsRequest] = None, *, requests: Optional[Sequence[google.cloud.documentai_v1beta2.types.document_understanding.ProcessDocumentRequest]] = None, retry: google.api_core.retry.Retry = <_MethodDefault._DEFAULT_VALUE:

LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.

  • Parameters

    • request (BatchProcessDocumentsRequest) – The request object. Request to batch process documents as an asynchronous operation. The output is written to Cloud Storage as JSON in the [Document] format.

    • requests (Sequence[~.document_understanding.ProcessDocumentRequest]) – Required. Individual requests for each document. This corresponds to the requests field on the request instance; if request is provided, this should not be set.

    • retry (google.api_core.retry.Retry) – Designation of what errors, if any, should be retried.

    • timeout (float) – The timeout for this request.

    • metadata (Sequence[Tuple[str, *[str](https://python.readthedocs.io/en/latest/library/stdtypes.html#str)]*]) – Strings which should be sent along with the request as metadata.

  • Returns

    An object representing a long-running operation.

    The result type for the operation will be :class:~.document_understanding.BatchProcessDocumentsResponse: Response to an batch document processing request. This is returned in the LRO Operation after the operation is complete.

  • Return type

    Operation

classmethod from_service_account_file(filename: str, *args, **kwargs)

Creates an instance of this client using the provided credentials file.

  • Parameters

    • filename (str) – The path to the service account private key json file.

    • args – Additional arguments to pass to the constructor.

    • kwargs – Additional arguments to pass to the constructor.

  • Returns

    The constructed client.

  • Return type

    {@api.name}

classmethod from_service_account_json(filename: str, *args, **kwargs)

Creates an instance of this client using the provided credentials file.

  • Parameters

    • filename (str) – The path to the service account private key json file.

    • args – Additional arguments to pass to the constructor.

    • kwargs – Additional arguments to pass to the constructor.

  • Returns

    The constructed client.

  • Return type

    {@api.name}

process_document(request: Optional[google.cloud.documentai_v1beta2.types.document_understanding.ProcessDocumentRequest] = None, *, retry: google.api_core.retry.Retry = <_MethodDefault._DEFAULT_VALUE:

Processes a single document.

  • Parameters

  • Returns

    Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

  • Return type

    Document

class google.cloud.documentai_v1beta2.EntityExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Parameters to control entity extraction behavior.

enabled()

Whether to enable entity extraction.

model_version()

Model version of the entity extraction. Default is “builtin/stable”. Specify “builtin/latest” for the latest model.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.FormExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Parameters to control form extraction behavior.

enabled()

Whether to enable form extraction.

key_value_pair_hints()

User can provide pairs of (key text, value type) to improve the parsing result.

For example, if a document has a field called “Date” that holds a date value and a field called “Amount” that may hold either a currency value (e.g., “$500.00”) or a simple number value (e.g., “20”), you could use the following hints: [ {“key”: “Date”, value_types: [ “DATE”]}, {“key”: “Amount”, “value_types”: [ “PRICE”, “NUMBER” ]} ]

If the value type is unknown, but you want to provide hints for the keys, you can leave the value_types field blank. e.g. {“key”: “Date”, “value_types”: []}

  • Type

    Sequence[KeyValuePairHint]

model_version()

Model version of the form extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model. For custom form models, specify: “custom/{model_name}”. Model name format is “bucket_name/path/to/modeldir” corresponding to “gs://bucket_name/path/to/modeldir” where annotated examples are stored.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.GcsDestination(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The Google Cloud Storage location where the output file will be written to.

uri()

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.GcsSource(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The Google Cloud Storage location where the input file will be read from.

uri()

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.InputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The desired input location and metadata.

gcs_source()

The Google Cloud Storage location to read the input from. This must be a single file.

  • Type

    GcsSource

contents()

Content in bytes, represented as a stream of bytes. Note: As with all bytes fields, proto buffer messages use a pure binary representation, whereas JSON representations use base64.

This field only works for synchronous ProcessDocument method.

mime_type()

Required. Mimetype of the input. Current supported mimetypes are application/pdf, image/tiff, and image/gif. In addition, application/json type is supported for requests with [ProcessDocumentRequest.automl_params][google.cloud.documentai.v1beta2.ProcessDocumentRequest.automl_params] field set. The JSON file needs to be in [Document][google.cloud.documentai.v1beta2.Document] format.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.KeyValuePairHint(mapping=None, *, ignore_unknown_fields=False, **kwargs)

User-provided hint for key value pair.

key()

The key text for the hint.

value_types()

Type of the value. This is case-insensitive, and could be one of: ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID, NUMBER, EMAIL, PRICE, TERMS, DATE, NAME. Types not in this list will be ignored.

  • Type

    Sequence[str]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.NormalizedVertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A vertex represents a 2D point in the image. NOTE: the normalized vertex coordinates are relative to the original image and range from 0 to 1.

x()

X coordinate.

y()

Y coordinate.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.OcrParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Parameters to control Optical Character Recognition (OCR) behavior.

language_hints()

List of languages to use for OCR. In most cases, an empty value yields the best results since it enables automatic language detection. For languages based on the Latin alphabet, setting language_hints is not needed. In rare cases, when the language of the text in the image is known, setting a hint will help get better results (although it will be a significant hindrance if the hint is wrong). Document processing returns an error if one or more of the specified languages is not one of the supported languages.

  • Type

    Sequence[str]

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.OperationMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Contains metadata for the BatchProcessDocuments operation.

state()

The state of the current batch processing.

  • Type

    State

state_message()

A message providing more details about the current state of processing.

create_time()

The creation time of the operation.

  • Type

    Timestamp

update_time()

The last update time of the operation.

  • Type

    Timestamp

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.OutputConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The desired output location and metadata.

gcs_destination()

The Google Cloud Storage location to write the output to.

  • Type

    GcsDestination

pages_per_shard()

The max number of pages to include into each output Document shard JSON on Google Cloud Storage.

The valid range is [1, 100]. If not specified, the default value is 20.

For example, for one pdf file with 100 pages, 100 parsed pages will be produced. If pages_per_shard = 20, then 5 Document shard JSON files each containing 20 parsed pages will be written under the prefix [OutputConfig.gcs_destination.uri][] and suffix pages-x-to-y.json where x and y are 1-indexed page numbers.

Example GCS outputs with 157 pages and pages_per_shard = 50:

pages-001-to-050.json pages-051-to-100.json pages-101-to-150.json pages-151-to-157.json

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.ProcessDocumentRequest(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Request to process one document.

parent()

Target project and location to make a call.

Format: projects/{project-id}/locations/{location-id}.

If no location is specified, a region will be chosen automatically. This field is only populated when used in ProcessDocument method.

input_config()

Required. Information about the input file.

  • Type

    InputConfig

output_config()

Optional. The desired output location. This field is only needed in BatchProcessDocumentsRequest.

  • Type

    OutputConfig

document_type()

Specifies a known document type for deeper structure detection. Valid values are currently “general” and “invoice”. If not provided, “general”is used as default. If any other value is given, the request is rejected.

table_extraction_params()

Controls table extraction behavior. If not specified, the system will decide reasonable defaults.

  • Type

    TableExtractionParams

form_extraction_params()

Controls form extraction behavior. If not specified, the system will decide reasonable defaults.

  • Type

    FormExtractionParams

entity_extraction_params()

Controls entity extraction behavior. If not specified, the system will decide reasonable defaults.

  • Type

    EntityExtractionParams

ocr_params()

Controls OCR behavior. If not specified, the system will decide reasonable defaults.

  • Type

    OcrParams

automl_params()

Controls AutoML model prediction behavior. AutoMlParams cannot be used together with other Params.

  • Type

    AutoMlParams

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.ProcessDocumentResponse(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Response to a single document processing request.

input_config()

Information about the input file. This is the same as the corresponding input config in the request.

  • Type

    InputConfig

output_config()

The output location of the parsed responses. The responses are written to this location as JSON-serialized Document objects.

  • Type

    OutputConfig

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.TableBoundHint(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A hint for a table bounding box on the page for table parsing.

page_number()

Optional. Page number for multi-paged inputs this hint applies to. If not provided, this hint will apply to all pages by default. This value is 1-based.

bounding_box()

Bounding box hint for a table on this page. The coordinates must be normalized to [0,1] and the bounding box must be an axis-aligned rectangle.

  • Type

    BoundingPoly

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.TableExtractionParams(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Parameters to control table extraction behavior.

enabled()

Whether to enable table extraction.

table_bound_hints()

Optional. Table bounding box hints that can be provided to complex cases which our algorithm cannot locate the table(s) in.

  • Type

    Sequence[TableBoundHint]

header_hints()

Optional. Table header hints. The extraction will bias towards producing these terms as table headers, which may improve accuracy.

  • Type

    Sequence[str]

model_version()

Model version of the table extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.

class google.cloud.documentai_v1beta2.Vertex(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A vertex represents a 2D point in the image. NOTE: the vertex coordinates are in the same scale as the original image.

x()

X coordinate.

y()

Y coordinate.

_delattr_(key)

Delete the value on the given field.

This is generally equivalent to setting a falsy value.

_eq_(other)

Return True if the messages are equal, False otherwise.

_ne_(other)

Return True if the messages are unequal, False otherwise.

_setattr_(key, value)

Set the value on the given field.

For well-known protocol buffer types which are marshalled, either the protocol buffer object or the Python equivalent is accepted.