Export-to-Workbench pipeline

You can transfer documents from the Document AI Warehouse to the Document AI Workbench using the export-to-Workbench pipeline. The pipeline exports the documents to a Cloud Storage folder, then imports them to a Document AI dataset. You provide the Cloud Storage folder and the Document AI dataset.

Prerequisites

Before you begin, you need the following:

  • Under the same Google Cloud project, follow the steps to create a processor .
  • Dedicate an empty Cloud Storage folder for storing exported documents.

  • On the custom processor page, click Configure Your Dataset and then Continue to initialize the dataset.

Run the pipeline

REST

curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION:runPipeline' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${AUTH_TOKEN}" \
--data '{
    "name": "projects/PROJECT_NUMBER/locations/LOCATION",
    "export_cdw_pipeline": {
        "documents": [
            "projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT",
        ],
        "export_folder_path": "gs://CLOUD STORAGE FOLDER",
        "doc_ai_dataset": "projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR/dataset",
        "training_split_ratio": RATIO,
    },
    "request_metadata": {
        "user_info": {
            "id": "user:USER EMAIL ADDRESS",
        }
    }
}'

The training and test split ratio can be specified in the training_split_ratio field as a floating-point number. For example, for a set of 10 documents, if the ratio is specified as 0.8, 8 documents will be added to the training set and the remaining 2 documents to the test set.

This command returns a resource name for a long-running operation. Use it to track the progress of the pipeline in the next step.

Get long-running operation result

REST

curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION' \
--header "Authorization: Bearer ${AUTH_TOKEN}"

Next step