Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Method: projects.locations.runPipeline

Run a predefined pipeline.

HTTP request

POST https://contentwarehouse.googleapis.com/v1/{name}:runPipeline

Path parameters

Parameters

Parameters
`name`	`string` Required. The resource name which owns the resources of the pipeline. Format: projects/{projectNumber}/locations/{location}. It takes the form `projects/{project}/locations/{location}`.

name

string

Required. The resource name which owns the resources of the pipeline. Format: projects/{projectNumber}/locations/{location}. It takes the form projects/{project}/locations/{location}.

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ "requestMetadata": { object (`RequestMetadata`) }, // Union field `pipeline` can be only one of the following: "gcsIngestPipeline": { object (`GcsIngestPipeline`) }, "gcsIngestWithDocAiProcessorsPipeline": { object (`GcsIngestWithDocAiProcessorsPipeline`) }, "exportCdwPipeline": { object (`ExportToCdwPipeline`) }, "processWithDocAiPipeline": { object (`ProcessWithDocAiPipeline`) } // End of list of possible types for union field `pipeline`. }

{
  "requestMetadata": {
    object (RequestMetadata)
  },

  // Union field pipeline can be only one of the following:
  "gcsIngestPipeline": {
    object (GcsIngestPipeline)
  },
  "gcsIngestWithDocAiProcessorsPipeline": {
    object (GcsIngestWithDocAiProcessorsPipeline)
  },
  "exportCdwPipeline": {
    object (ExportToCdwPipeline)
  },
  "processWithDocAiPipeline": {
    object (ProcessWithDocAiPipeline)
  }
  // End of list of possible types for union field pipeline.
}

Fields
`requestMetadata`	`object (RequestMetadata)` The meta information collected about the end user, used to enforce access control for the service.
Union field `pipeline`. The predefined pipelines. `pipeline` can be only one of the following:
`gcsIngestPipeline`	`object (GcsIngestPipeline)` Cloud Storage ingestion pipeline.
`gcsIngestWithDocAiProcessorsPipeline`	`object (GcsIngestWithDocAiProcessorsPipeline)` Use DocAI processors to process documents in Cloud Storage and ingest them to Document Warehouse.
`exportCdwPipeline`	`object (ExportToCdwPipeline)` Export docuemnts from Document Warehouse to CDW for training purpose.
`processWithDocAiPipeline`	`object (ProcessWithDocAiPipeline)` Use a DocAI processor to process documents in Document Warehouse, and re-ingest the updated results into Document Warehouse.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

contentwarehouse.documents.create

For more information, see the IAM documentation.

GcsIngestPipeline

The configuration of the Cloud Storage Ingestion pipeline.

JSON representation
{ "inputPath": string, "schemaName": string, "processorType": string, "skipIngestedDocuments": boolean, "pipelineConfig": { object (`IngestPipelineConfig`) } }

Fields
`inputPath`	`string` The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: `gs://<bucket-name>/<folder-name>`.
`schemaName`	`string` The Document Warehouse schema resource name. All documents processed by this pipeline will use this schema. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}.
`processorType`	`string` The Doc AI processor type name. Only used when the format of ingested files is Doc AI Document proto format.
`skipIngestedDocuments`	`boolean` The flag whether to skip ingested documents. If it is set to true, documents in Cloud Storage contains key "status" with value "status=ingested" in custom metadata will be skipped to ingest.
`pipelineConfig`	`object (IngestPipelineConfig)` Optional. The config for the Cloud Storage Ingestion pipeline. It provides additional customization options to run the pipeline and can be skipped if it is not applicable.

IngestPipelineConfig

The ingestion pipeline config.

JSON representation
{ "documentAclPolicy": { object (`Policy`) }, "enableDocumentTextExtraction": boolean, "folder": string, "cloudFunction": string }

Fields
`documentAclPolicy`	`object (Policy)` The document level acl policy config. This refers to an Identity and Access (IAM) policy, which specifies access controls for all documents ingested by the pipeline. The `role` and `members` under the policy needs to be specified. The following roles are supported for document level acl control: * roles/contentwarehouse.documentAdmin * roles/contentwarehouse.documentEditor * roles/contentwarehouse.documentViewer The following members are supported for document level acl control: * user:user-email@example.com * group:group-email@example.com note that for documents searched with LLM, only single level user or group acl check is supported.
`enableDocumentTextExtraction`	`boolean` The document text extraction enabled flag. If the flag is set to true, DWH will perform text extraction on the raw document.
`folder`	`string` Optional. The name of the folder to which all ingested documents will be linked during ingestion process. Format is `projects/{project}/locations/{location}/documents/{folder_id}`
`cloudFunction`	`string` The Cloud Function resource name. The Cloud Function needs to live inside consumer project and is accessible to Document AI Warehouse P4SA. Only Cloud Functions V2 is supported. Cloud function execution should complete within 5 minutes or this file ingestion may fail due to timeout. Format: `https://{region}-{projectId}.cloudfunctions.net/{cloudFunction}` The following keys are available the request json payload. * displayName * properties * plainText * referenceId * documentSchemaName * rawDocumentPath * rawDocumentFileType The following keys from the cloud function json response payload will be ingested to the Document AI Warehouse as part of Document proto content and/or related information. The original values will be overridden if any key is present in the response. * displayName * properties * plainText * documentAclPolicy * folder

GcsIngestWithDocAiProcessorsPipeline

The configuration of the Cloud Storage Ingestion with DocAI Processors pipeline.

JSON representation

JSON representation
{ "inputPath": string, "splitClassifyProcessorInfo": { object (`ProcessorInfo`) }, "extractProcessorInfos": [ { object (`ProcessorInfo`) } ], "processorResultsFolderPath": string, "skipIngestedDocuments": boolean, "pipelineConfig": { object (`IngestPipelineConfig`) } }

{
  "inputPath": string,
  "splitClassifyProcessorInfo": {
    object (ProcessorInfo)
  },
  "extractProcessorInfos": [
    {
      object (ProcessorInfo)
    }
  ],
  "processorResultsFolderPath": string,
  "skipIngestedDocuments": boolean,
  "pipelineConfig": {
    object (IngestPipelineConfig)
  }
}

Fields
`inputPath`	`string` The input Cloud Storage folder. All files under this folder will be imported to Document Warehouse. Format: `gs://<bucket-name>/<folder-name>`.
`splitClassifyProcessorInfo`	`object (ProcessorInfo)` The split and classify processor information. The split and classify result will be used to find a matched extract processor.
`extractProcessorInfos[]`	`object (ProcessorInfo)` The extract processors information. One matched extract processor will be used to process documents based on the classify processor result. If no classify processor is specified, the first extract processor will be used.
`processorResultsFolderPath`	`string` The Cloud Storage folder path used to store the raw results from processors. Format: `gs://<bucket-name>/<folder-name>`.
`skipIngestedDocuments`	`boolean` The flag whether to skip ingested documents. If it is set to true, documents in Cloud Storage contains key "status" with value "status=ingested" in custom metadata will be skipped to ingest.
`pipelineConfig`	`object (IngestPipelineConfig)` Optional. The config for the Cloud Storage Ingestion with DocAI Processors pipeline. It provides additional customization options to run the pipeline and can be skipped if it is not applicable.

ProcessorInfo

The DocAI processor information.

JSON representation
{ "processorName": string, "documentType": string, "schemaName": string }

Fields

Fields
`processorName`	`string` The processor resource name. Format is `projects/{project}/locations/{location}/processors/{processor}`, or `projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}`
`documentType`	`string` The processor will process the documents with this document type.
`schemaName`	`string` The Document schema resource name. All documents processed by this processor will use this schema. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}.

processorName

string

The processor resource name. Format is projects/{project}/locations/{location}/processors/{processor}, or projects/{project}/locations/{location}/processors/{processor}/processorVersions/{processorVersion}

documentType

string

The processor will process the documents with this document type.

schemaName

string

The Document schema resource name. All documents processed by this processor will use this schema. Format: projects/{projectNumber}/locations/{location}/documentSchemas/{document_schema_id}.

ExportToCdwPipeline

The configuration of exporting documents from the Document Warehouse to CDW pipeline.

JSON representation
{ "documents": [ string ], "exportFolderPath": string, "docAiDataset": string, "trainingSplitRatio": number }

Fields
`documents[]`	`string` The list of all the resource names of the documents to be processed. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}.
`exportFolderPath`	`string` The Cloud Storage folder path used to store the exported documents before being sent to CDW. Format: `gs://<bucket-name>/<folder-name>`.
`docAiDataset`	`string` Optional. The CDW dataset resource name. This field is optional. If not set, the documents will be exported to Cloud Storage only. Format: projects/{project}/locations/{location}/processors/{processor}/dataset
`trainingSplitRatio`	`number` Ratio of training dataset split. When importing into Document AI Workbench, documents will be automatically split into training and test split category with the specified ratio. This field is required if docAiDataset is set.

ProcessWithDocAiPipeline

The configuration of processing documents in Document Warehouse with DocAi processors pipeline.

JSON representation
{ "documents": [ string ], "exportFolderPath": string, "processorInfo": { object (`ProcessorInfo`) }, "processorResultsFolderPath": string }

Fields
`documents[]`	`string` The list of all the resource names of the documents to be processed. Format: projects/{projectNumber}/locations/{location}/documents/{documentId}.
`exportFolderPath`	`string` The Cloud Storage folder path used to store the exported documents before being sent to CDW. Format: `gs://<bucket-name>/<folder-name>`.
`processorInfo`	`object (ProcessorInfo)` The CDW processor information.
`processorResultsFolderPath`	`string` The Cloud Storage folder path used to store the raw results from processors. Format: `gs://<bucket-name>/<folder-name>`.