Class IngestPipelineConfig (0.7.7)

IngestPipelineConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The ingestion pipeline config.

Attributes

NameDescription
document_acl_policy google.iam.v1.policy_pb2.Policy
The document level acl policy config. This refers to an Identity and Access (IAM) policy, which specifies access controls for all documents ingested by the pipeline. The role][google.iam.v1.Binding.role] and members][google.iam.v1.Binding.role] under the policy needs to be specified. The following roles are supported for document level acl control: - roles/contentwarehouse.documentAdmin - roles/contentwarehouse.documentEditor - roles/contentwarehouse.documentViewer The following members are supported for document level acl control: - user:user-email@example.com - group:group-email@example.com Note that for documents searched with LLM, only single level user or group acl check is supported.
enable_document_text_extraction bool
The document text extraction enabled flag. If the flag is set to true, DWH will perform text extraction on the raw document.
folder str
Optional. The name of the folder to which all ingested documents will be linked during ingestion process. Format is projects/{project}/locations/{location}/documents/{folder_id}
cloud_function str
The Cloud Function resource name. The Cloud Function needs to live inside consumer project and is accessible to Document AI Warehouse P4SA. Only Cloud Functions V2 is supported. Cloud function execution should complete within 5 minutes or this file ingestion may fail due to timeout. Format: https://{region}-{project_id}.cloudfunctions.net/{cloud_function} The following keys are available the request json payload. - display_name - properties - plain_text - reference_id - document_schema_name - raw_document_path - raw_document_file_type The following keys from the cloud function json response payload will be ingested to the Document AI Warehouse as part of Document proto content and/or related information. The original values will be overridden if any key is present in the response. - display_name - properties - plain_text - document_acl_policy - folder