Class GcsIngestWithDocAiProcessorsPipeline (0.7.7)

GcsIngestWithDocAiProcessorsPipeline(
    mapping=None, *, ignore_unknown_fields=False, **kwargs
)

The configuration of the Cloud Storage Ingestion with DocAI Processors pipeline.

Attributes

NameDescription
input_path str
The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: gs://.
split_classify_processor_info google.cloud.contentwarehouse_v1.types.ProcessorInfo
The split and classify processor information. The split and classify result will be used to find a matched extract processor.
extract_processor_infos MutableSequence[google.cloud.contentwarehouse_v1.types.ProcessorInfo]
The extract processors information. One matched extract processor will be used to process documents based on the classify processor result. If no classify processor is specified, the first extract processor will be used.
processor_results_folder_path str
The Cloud Storage folder path used to store the raw results from processors. Format: gs://.
skip_ingested_documents bool
The flag whether to skip ingested documents. If it is set to true, documents in Cloud Storage contains key "status" with value "status=ingested" in custom metadata will be skipped to ingest.
pipeline_config google.cloud.contentwarehouse_v1.types.IngestPipelineConfig
Optional. The config for the Cloud Storage Ingestion with DocAI Processors pipeline. It provides additional customization options to run the pipeline and can be skipped if it is not applicable.