Manual setup

In this page, we review the manual setup guide.

Create a document schema

To use the connector with Document AI parsing, we need to create a schema with a property called DocaiEntities. Here is the proto definition of what are required.

  name: DocaiEntities
  displayName: DocaiEntities
  isSearchable: true
  isFilterable: true
  mapTypeOptions: {}

Create schema via curl

Here is the guide for how to create a document schema

In our use case, please run the following command for creating a schema:

curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/<PROJECT_NUMBER>/locations/<LOCATION>/documentSchemas' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $(gcloud auth print-identity-token)" \
--data '{
        "display_name":"Document Schema 1",
        "property_definitions": [
            {
                "name": "DocaiEntities",
                "display_name": "DocaiEntities",
                "is_searchable": true,
                "is_filterable": true,
                "map_type_options": {}
            },
        ]
    }'

Create schema in UI

Setting up the UI

You can also create the schema in our preview UI, follow the admin web application guide for setting up the project.

Go to admin page

Go to admin page

Click Create in schema management

Click create

Give schema a name

Give a name

Add a DocaiEntities map property

Add a property

Here is the property definition that you can copy from:

{
    "name": "DocaiEntities",
    "display_name": "DocaiEntities",
    "is_repeatable": false,
    "is_filterable": true,
    "is_searchable": true,
    "is_metadata": false,
    "is_required": false,
    "map_type_options": {}
}

Then click Done to create the schema.

If you need to add you own properties, you can also do it during creation as additional entries in properties.

Create a folder schema

Please refer to the guide for how to create a folder schema

Create a folder

Please refer to the guide for how to create a folder

Create a processor

Follow the steps on creating and managing processors

Manual setups for initialization

If any issues come up during the execution of the initialize.sh script, please refer to the following links for setting up individual steps.

Enabling APIs

To use Cloud Storage-Document AI Warehouse connector, you need to enable the following APIs:

  1. Document AI Warehouse
  2. Document AI
  3. Workflows API
  4. Cloud Functions API
  5. Cloud Tasks API
  6. Cloud Build API

Please refer to the Enabling APIs guide for how to enable these APIs manually

Setup a service account

Using existing service account

If you have used Document AI Warehouse before, you are likely to have an existing service account. If you use self-provisioning to do the initial setups, then there is a service account created already. For example, DocAI Warehouse UI Service Account is created during provisioning with email to be:

    dw-ui-service-account@<you_project_id>.iam.gserviceaccount.com

We can reuse the existing service account. We just need to grant it with more roles. The list of roles required are listed below.

Roles required for service account

The roles required for a service account are:

  1. Cloud Functions Developer
  2. Cloud Functions Invoker
  3. Cloud Functions Viewer
  4. Cloud Tasks Admin
  5. Content Warehouse Admin
  6. Document AI API User
  7. Document AI Viewer
  8. Service Account User
  9. Logging Admin
  10. Storage Admin
  11. Storage Object Admin
  12. Storage Object Creator
  13. Storage Object Viewer
  14. Workflows Admin
  15. Workflows Editor
  16. Workflows Invoker
  17. Workflows Viewer

Refer to the Grant a single role guide for how to add roles to existing service accounts.

Adding roles through Google Cloud CLI

Alternatively, you can assign roles using gcloud.

    PROJECT_ID=<project ID>
    SA=<service account name>
    gcloud iam service-accounts create $SA
    FULL_ID=$(gcloud iam service-accounts list --filter="email ~ ^$SA" --format='value(email)')
    roles = (
        "roles/cloudfunctions.developer"
        "roles/cloudfunctions.invoker"
        "roles/cloudfunctions.viewer"
        "roles/cloudtasks.admin"
        "roles/cloudtasks.enqueuer"
        "roles/contentwarehouse.admin"
        "roles/documentai.apiUser"
        "roles/documentai.viewer"
        "roles/iam.serviceAccountUser"
        "roles/logging.admin"
        "roles/storage.admin"
        "roles/storage.objectAdmin"
        "roles/storage.objectCreator"
        "roles/storage.objectViewer"
        "roles/workflows.admin"
        "roles/workflows.editor"
        "roles/workflows.invoker"
        "roles/workflows.viewer"
    )

    for role in ${roles[@]};
    do
        echo -e "grant role $role to $FULL_ID"
        gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:$FULL_ID --role="$role"
    done

Creating and executing a workflow

Follow the steps on create and manage workflows.

The required workflows for deployment can be found in the code repository.

Be sure to configure the workflow to use the service account for execution.

configure to use the DW UI service account

Execute a workflow

Follow the steps on execute a workflow to start with the workflow previously created.

Create a Cloud Function

Follow the steps on console quickstart for Cloud Functions

The required functions for deployment can be found in the code repository.

Be sure to configure the functions to use the service account for execution.

configure to use the DW UI service account

Deploy the function

Follow the steps on deploy Cloud Functions.

Testing the function

Follow the steps on testing overview.

Create Cloud Task Queues

Follow the steps on Create Google Tasks queues.

We need two queues: general-files-queue and office-files-queue. For general-files-queue, please set Max dispatches as 2 and Max concurrent dispatches as 120. For office-files-queue, please set Max dispatches to be 1 and Max concurrent dispatches to be 10.

    gcloud tasks queues create general-files-queue --max-dispatches-per-second=2 --max-concurrent-dispatches=120
    gcloud tasks queues create office-files-queue --max-dispatches-per-second=1 --max-concurrent-dispatches=10

Viewing the logs

View logs in Cloud Workflows

Logs of workflow executions can be found in the Log tab with in each workflow: where to find logs in Cloud Workflows

View logs in Cloud Functions

Logs of function executions can be found in the Log tab with in each function: where to find logs in Cloud Functions

View logs in Cloud Google Tasks

Logs of function executions can be found in the Log tab with in each queue. However, by default, logging is disabled. You need to enable logging if you want to save logs for the task queue. After enabling it, logs of further tasks are shown in the same tab. where to find logs in Cloud Tasks