You can transfer documents from the Document AI Warehouse to the Document AI Workbench using the export-to-Workbench pipeline. The pipeline exports the documents to a Cloud Storage folder, then imports them to a Document AI dataset. You provide the Cloud Storage folder and the Document AI dataset.
Prerequisites
Before you begin, you need the following:
- Under the same Google Cloud project, follow the steps to create a processor .
- Dedicate an empty Cloud Storage folder for storing exported documents. 
- On the custom processor page, click Configure Your Dataset and then Continue to initialize the dataset. 
Run the pipeline
REST
curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION:runPipeline' \ --header 'Content-Type: application/json' \ --header "Authorization: Bearer ${AUTH_TOKEN}" \ --data '{ "name": "projects/PROJECT_NUMBER/locations/LOCATION", "export_cdw_pipeline": { "documents": [ "projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT", ], "export_folder_path": "gs://CLOUD STORAGE FOLDER", "doc_ai_dataset": "projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR/dataset", "training_split_ratio": RATIO, }, "request_metadata": { "user_info": { "id": "user:USER EMAIL ADDRESS", } }}'
The training and test split ratio can be specified in the training_split_ratio field as a floating-point number. For example, for a set of 10 documents, if the ratio is specified as 0.8, 8 documents will be added to the training set and the remaining 2 documents to the test set. 
This command returns a resource name for a long-running operation. Use it to track the progress of the pipeline in the next step.
Get long-running operation result
REST
curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION' \
--header "Authorization: Bearer ${AUTH_TOKEN}"Next step
- Go to your Document AI to check exported documents.