Class Document (1.5.0)

Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Attributes

NameDescription
uri str
Optional. Currently supports Google Cloud Storage URI of the form ``gs://bucket_name/object_name``. Object versioning is not supported. See `Google Cloud Storage Request URIs
content bytes
Optional. Inline document content, represented as a stream of bytes. Note: As with all ``bytes`` fields, protobuffers use a pure binary representation, whereas JSON representations use base64. This field is a member of `oneof`_ ``source``.
mime_type str
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media-types/media-types.xhtml.
text str
Optional. UTF-8 encoded text in reading order from the document.
text_styles Sequence[google.cloud.documentai_v1.types.Document.Style]
Styles for the Document.text.
pages Sequence[google.cloud.documentai_v1.types.Document.Page]
Visual page layout for the Document.
entities Sequence[google.cloud.documentai_v1.types.Document.Entity]
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries.
entity_relations Sequence[google.cloud.documentai_v1.types.Document.EntityRelation]
Relationship among Document.entities.
text_changes Sequence[google.cloud.documentai_v1.types.Document.TextChange]
A list of text corrections made to [Document.text]. This is usually used for annotating corrections to OCR mistakes. Text changes for a given revision may not overlap with each other.
shard_info google.cloud.documentai_v1.types.Document.ShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified.
error google.rpc.status_pb2.Status
Any error that occurred while processing this document.
revisions Sequence[google.cloud.documentai_v1.types.Document.Revision]
Revision history of this document.

Inheritance

builtins.object > proto.message.Message > Document

Classes

Entity

Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)

An entity that could be a phrase in the text or a property that belongs to the document. It is a known entity type, such as a person, an organization, or location.

EntityRelation

EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Relationship between Entities.

Page

Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)

A page in a Document.

PageAnchor

PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Referencing the visual context of the entity in the Document.pages. Page anchors can be cross-page, consist of multiple bounding polygons and optionally reference specific layout element types.

Provenance

Provenance(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Structure to identify provenance relationships between annotations in different revisions.

Revision

Revision(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Contains past or forward revisions of this document.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

ShardInfo

ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)

For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.

Style

Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Annotation for common text style attributes. This adheres to CSS conventions as much as possible.

TextAnchor

TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Text reference indexing into the Document.text.

TextChange

TextChange(mapping=None, *, ignore_unknown_fields=False, **kwargs)

This message is used for text changes aka. OCR corrections.