Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Export-to-Workbench pipeline

You can transfer documents from the Document AI Warehouse to the Document AI Workbench using the export-to-Workbench pipeline. The pipeline exports the documents to a Cloud Storage folder, then imports them to a Document AI dataset. You provide the Cloud Storage folder and the Document AI dataset.

Prerequisites

Before you begin, you need the following:

Under the same Google Cloud project, follow the steps to create a processor .
Dedicate an empty Cloud Storage folder for storing exported documents.
On the custom processor page, click Configure Your Dataset and then Continue to initialize the dataset.

Run the pipeline

REST

curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION:runPipeline' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${AUTH_TOKEN}" \
--data '{
    "name": "projects/PROJECT_NUMBER/locations/LOCATION",
    "export_cdw_pipeline": {
        "documents": [
            "projects/PROJECT_NUMBER/locations/LOCATION/documents/DOCUMENT",
        ],
        "export_folder_path": "gs://CLOUD STORAGE FOLDER",
        "doc_ai_dataset": "projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR/dataset",
        "training_split_ratio": RATIO,
    },
    "request_metadata": {
        "user_info": {
            "id": "user:USER EMAIL ADDRESS",
        }
    }
}'

The training and test split ratio can be specified in the training_split_ratio field as a floating-point number. For example, for a set of 10 documents, if the ratio is specified as 0.8, 8 documents will be added to the training set and the remaining 2 documents to the test set.

This command returns a resource name for a long-running operation. Use it to track the progress of the pipeline in the next step.

Get long-running operation result

REST

curl --location --request GET 'https://contentwarehouse.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION' \
--header "Authorization: Bearer ${AUTH_TOKEN}"

Next step

Go to your Document AI to check exported documents.