- 3.1.0 (latest)
- 3.0.1
- 2.35.0
- 2.34.0
- 2.33.0
- 2.32.0
- 2.30.0
- 2.29.3
- 2.28.0
- 2.27.1
- 2.26.0
- 2.25.0
- 2.24.2
- 2.23.0
- 2.22.0
- 2.21.1
- 2.20.2
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.1
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.1
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.1
- 2.3.0
- 2.2.0
- 2.1.0
- 2.0.3
- 1.5.1
- 1.4.2
- 1.3.0
- 1.2.1
- 1.1.0
- 1.0.0
- 0.5.2
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.0
Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Document represents the canonical document resource in Document Understanding AI. It is an interchange format that provides insights into documents and allows for collaboration between users and Document Understanding AI to iterate and optimize for quality.
Attributes | |
---|---|
Name | Description |
uri |
str
Currently supports Google Cloud Storage URI of the form gs://bucket_name/object_name . Object versioning is not
supported. See `Google Cloud Storage Request
URIs |
content |
bytes
Inline document content, represented as a stream of bytes. Note: As with all bytes fields, protobuffers use a pure
binary representation, whereas JSON representations use
base64.
|
mime_type |
str
An IANA published MIME type (also referred to as media type). For more information, see https://www.iana.org/assignments/media- types/media-types.xhtml. |
text |
str
UTF-8 encoded text in reading order from the document. |
text_styles |
Sequence[google.cloud.documentai_v1beta2.types.Document.Style]
Styles for the Document.text. |
pages |
Sequence[google.cloud.documentai_v1beta2.types.Document.Page]
Visual page layout for the Document. |
entities |
Sequence[google.cloud.documentai_v1beta2.types.Document.Entity]
A list of entities detected on Document.text. For document shards, entities in this list may cross shard boundaries. |
entity_relations |
Sequence[google.cloud.documentai_v1beta2.types.Document.EntityRelation]
Relationship among Document.entities. |
shard_info |
google.cloud.documentai_v1beta2.types.Document.ShardInfo
Information about the sharding if this document is sharded part of a larger document. If the document is not sharded, this message is not specified. |
labels |
Sequence[google.cloud.documentai_v1beta2.types.Document.Label]
Labels for this document. |
error |
google.rpc.status_pb2.Status
Any error that occurred while processing this document. |
Classes
Entity
Entity(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A phrase in the text that is a known entity type, such as a person, an organization, or location.
EntityRelation
EntityRelation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Relationship between Entities.
Label
Label(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Label attaches schema information and/or other metadata to segments within a Document. Multiple Labels on a single field can denote either different labels, different instances of the same label created at different times, or some combination of both.
Page
Page(mapping=None, *, ignore_unknown_fields=False, **kwargs)
A page in a Document. .. attribute:: page_number
1-based index for current Page in a parent Document. Useful when a page is taken out of a Document for individual processing.
:type: int
PageAnchor
PageAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Referencing elements in Document.pages.
ShardInfo
ShardInfo(mapping=None, *, ignore_unknown_fields=False, **kwargs)
For a large document, sharding may be performed to produce several document shards. Each document shard contains this field to detail which shard it is.
Style
Style(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Annotation for common text style attributes. This adheres to CSS conventions as much as possible.
TextAnchor
TextAnchor(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Text reference indexing into the Document.text.