Run a predefined pipeline.
HTTP request
POST https://contentwarehouse.googleapis.com/v1/{name}:runPipeline
Path parameters
Parameters | |
---|---|
name |
Required. The resource name which owns the resources of the pipeline. Format: projects/{projectNumber}/locations/{location}. It takes the form |
Request body
The request body contains data with the following structure:
JSON representation |
---|
{ "requestMetadata": { object ( |
Fields | |
---|---|
requestMetadata |
The meta information collected about the end user, used to enforce access control for the service. |
Union field pipeline . The predefined pipelines. pipeline can be only one of the following: |
|
gcsIngestPipeline |
Cloud Storage ingestion pipeline. |
gcsIngestWithDocAiProcessorsPipeline |
Use DocAI processors to process documents in Cloud Storage and ingest them to Document Warehouse. |
exportCdwPipeline |
Export docuemnts from Document Warehouse to CDW for training purpose. |
processWithDocAiPipeline |
Use a DocAI processor to process documents in Document Warehouse, and re-ingest the updated results into Document Warehouse. |
Response body
If successful, the response body contains an instance of Operation
.
Authorization Scopes
Requires the following OAuth scope:
https://www.googleapis.com/auth/cloud-platform
For more information, see the Authentication Overview.
IAM Permissions
Requires the following IAM permission on the name
resource:
contentwarehouse.documents.create
For more information, see the IAM documentation.
GcsIngestPipeline
The configuration of the Cloud Storage ingestion pipeline.
JSON representation |
---|
{ "inputPath": string, "schemaName": string, "processorType": string } |
Fields | |
---|---|
inputPath |
The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: gs:// |
schemaName |
The Document Warehouse schema resource name. All documents processed by this pipeline will use this schema. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}. |
processorType |
The Doc AI processor type name. Only used when the format of ingested files is Doc AI Document proto format. |
GcsIngestWithDocAiProcessorsPipeline
The configuration of the document classify/split and entity/kvp extraction pipeline.
JSON representation |
---|
{ "inputPath": string, "splitClassifyProcessorInfo": { object ( |
Fields | |
---|---|
inputPath |
The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: gs:// |
splitClassifyProcessorInfo |
The split and classify processor information. The split and classify result will be used to find a matched extract processor. |
extractProcessorInfos[] |
The extract processors information. One matched extract processor will be used to process documents based on the classify processor result. If no classify processor is specificied, the first extract processor will be used. |
processorResultsFolderPath |
The Cloud Storage folder path used to store the raw results from processors. Format: gs:// |
ProcessorInfo
The DocAI processor information.
JSON representation |
---|
{ "processorName": string, "documentType": string, "schemaName": string } |
Fields | |
---|---|
processorName |
The processor resource name. Format is |
documentType |
The processor will process the documents with this document type. |
schemaName |
The Document schema resource name. All documents processed by this processor will use this schema. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}. |
ExportToCdwPipeline
The configuration of exporting documents from the Document Warehouse to CDW pipeline.
JSON representation |
---|
{ "documents": [ string ], "exportFolderPath": string, "docAiDataset": string, "trainingSplitRatio": number } |
Fields | |
---|---|
documents[] |
The list of all the resource names of the documents to be processed. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}. |
exportFolderPath |
The Cloud Storage folder path used to store the exported documents before being sent to CDW. Format: gs:// |
docAiDataset |
The CDW dataset resource name. Format: projects/{project}/locations/{location}/processors/{processor}/dataset |
trainingSplitRatio |
Ratio of training dataset split. When importing into Document AI Workbench, documents will be automatically split into training and test split category with the specified ratio. |
ProcessWithDocAiPipeline
The configuration of processing documents in Document Warehouse with DocAi processors pipeline.
JSON representation |
---|
{
"documents": [
string
],
"exportFolderPath": string,
"processorInfo": {
object ( |
Fields | |
---|---|
documents[] |
The list of all the resource names of the documents to be processed. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}. |
exportFolderPath |
The Cloud Storage folder path used to store the exported documents before being sent to CDW. Format: gs:// |
processorInfo |
The CDW processor information. |
processorResultsFolderPath |
The Cloud Storage folder path used to store the raw results from processors. Format: gs:// |