Class Document (0.5.1)

Document(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Defines the structure for content warehouse document proto.

This message has oneof_ fields (mutually exclusive fields). For each oneof, at most one member field can be set at the same time. Setting any member of the oneof automatically clears all other members.

.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

Attributes

NameDescription
name str
The resource name of the document. Format: projects/{project_number}/locations/{location}/documents/{document_id}. The name is ignored when creating a document.
reference_id str
The reference ID set by customers. Must be unique per project and location.
display_name str
Required. Display name of the document given by the user. This name will be displayed in the UI. Customer can populate this field with the name of the document. This differs from the 'title' field as 'title' is optional and stores the top heading in the document.
title str
Title that describes the document. This can be the top heading or text that describes the document.
display_uri str
Uri to display the document, for example, in the UI.
document_schema_name str
The Document schema name. Format: projects/{project_number}/locations/{location}/documentSchemas/{document_schema_id}.
plain_text str
Other document format, such as PPTX, XLXS This field is a member of oneof_ structured_content.
cloud_ai_document google.cloud.documentai_v1.types.Document
Document AI format to save the structured content, including OCR. This field is a member of oneof_ structured_content.
structured_content_uri str
A path linked to structured content file.
raw_document_path str
Raw document file in Cloud Storage path. This field is a member of oneof_ raw_document.
inline_raw_document bytes
Raw document content. This field is a member of oneof_ raw_document.
properties MutableSequence[google.cloud.contentwarehouse_v1.types.Property]
List of values that are user supplied metadata.
update_time google.protobuf.timestamp_pb2.Timestamp
Output only. The time when the document is last updated.
create_time google.protobuf.timestamp_pb2.Timestamp
Output only. The time when the document is created.
raw_document_file_type google.cloud.contentwarehouse_v1.types.RawDocumentFileType
This is used when DocAI was not used to load the document and parsing/ extracting is needed for the inline_raw_document. For example, if inline_raw_document is the byte representation of a PDF file, then this should be set to: RAW_DOCUMENT_FILE_TYPE_PDF.
async_enabled bool
If true, makes the document visible to asynchronous policies and rules.
content_category google.cloud.contentwarehouse_v1.types.ContentCategory
Indicates the category (image, audio, video etc.) of the original content.
text_extraction_disabled bool
If true, text extraction will not be performed.
text_extraction_enabled bool
If true, text extraction will be performed.
creator str
The user who creates the document.
updater str
The user who lastly updates the document.