In this page, we review the manual setup guide.
Create a document schema
To use the connector with Document AI parsing, we need to create a
schema with a property called DocaiEntities
. Here is the proto definition of
what are required.
name: DocaiEntities
displayName: DocaiEntities
isSearchable: true
isFilterable: true
mapTypeOptions: {}
Create schema via curl
Here is the guide for how to create a document schema
In our use case, please run the following command for creating a schema:
curl --location --request POST 'https://contentwarehouse.googleapis.com/v1/projects/<PROJECT_NUMBER>/locations/<LOCATION>/documentSchemas' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $(gcloud auth print-identity-token)" \
--data '{
"display_name":"Document Schema 1",
"property_definitions": [
{
"name": "DocaiEntities",
"display_name": "DocaiEntities",
"is_searchable": true,
"is_filterable": true,
"map_type_options": {}
},
]
}'
Create schema in UI
Setting up the UI
You can also create the schema in our preview UI, follow the admin web application guide for setting up the project.
Go to admin page
Click Create
in schema management
Give schema a name
Add a DocaiEntities
map property
Here is the property definition that you can copy from:
{
"name": "DocaiEntities",
"display_name": "DocaiEntities",
"is_repeatable": false,
"is_filterable": true,
"is_searchable": true,
"is_metadata": false,
"is_required": false,
"map_type_options": {}
}
Then click Done
to create the schema.
If you need to add you own properties, you can also do it during creation as
additional entries in properties
.
Create a folder schema
Please refer to the guide for how to create a folder schema
Create a folder
Please refer to the guide for how to create a folder
Create a processor
Follow the steps on creating and managing processors
Manual setups for initialization
If any issues come up during the execution of the initialize.sh
script, please
refer to the following links for setting up individual steps.
Enabling APIs
To use Cloud Storage-Document AI Warehouse connector, you need to enable the following APIs:
Please refer to the Enabling APIs guide for how to enable these APIs manually
Setup a service account
Using existing service account
If you have used Document AI Warehouse before, you are likely to have an existing
service account. If you use self-provisioning to do the initial
setups, then there is a service account created already. For example, DocAI
Warehouse UI Service Account
is created during provisioning with email to be:
dw-ui-service-account@<you_project_id>.iam.gserviceaccount.com
We can reuse the existing service account. We just need to grant it with more roles. The list of roles required are listed below.
Roles required for service account
The roles required for a service account are:
- Cloud Functions Developer
- Cloud Functions Invoker
- Cloud Functions Viewer
- Cloud Tasks Admin
- Content Warehouse Admin
- Document AI API User
- Document AI Viewer
- Service Account User
- Logging Admin
- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer
- Workflows Admin
- Workflows Editor
- Workflows Invoker
- Workflows Viewer
Refer to the Grant a single role guide for how to add roles to existing service accounts.
Adding roles through Google Cloud CLI
Alternatively, you can assign roles using gcloud.
PROJECT_ID=<project ID>
SA=<service account name>
gcloud iam service-accounts create $SA
FULL_ID=$(gcloud iam service-accounts list --filter="email ~ ^$SA" --format='value(email)')
roles = (
"roles/cloudfunctions.developer"
"roles/cloudfunctions.invoker"
"roles/cloudfunctions.viewer"
"roles/cloudtasks.admin"
"roles/cloudtasks.enqueuer"
"roles/contentwarehouse.admin"
"roles/documentai.apiUser"
"roles/documentai.viewer"
"roles/iam.serviceAccountUser"
"roles/logging.admin"
"roles/storage.admin"
"roles/storage.objectAdmin"
"roles/storage.objectCreator"
"roles/storage.objectViewer"
"roles/workflows.admin"
"roles/workflows.editor"
"roles/workflows.invoker"
"roles/workflows.viewer"
)
for role in ${roles[@]};
do
echo -e "grant role $role to $FULL_ID"
gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:$FULL_ID --role="$role"
done
Creating and executing a workflow
Follow the steps on create and manage workflows.
The required workflows for deployment can be found in the code repository.
Be sure to configure the workflow to use the service account for execution.
Execute a workflow
Follow the steps on execute a workflow to start with the workflow previously created.
Create a Cloud Function
Follow the steps on console quickstart for Cloud Functions
The required functions for deployment can be found in the code repository.
Be sure to configure the functions to use the service account for execution.
Deploy the function
Follow the steps on deploy Cloud Functions.
Testing the function
Follow the steps on testing overview.
Create Cloud Task Queues
Follow the steps on Create Google Tasks queues.
We need two queues: general-files-queue
and office-files-queue
. For
general-files-queue
, please set Max dispatches
as 2
and Max concurrent
dispatches
as 120
. For office-files-queue
, please set Max dispatches
to
be 1
and Max concurrent dispatches
to be 10
.
gcloud tasks queues create general-files-queue --max-dispatches-per-second=2 --max-concurrent-dispatches=120
gcloud tasks queues create office-files-queue --max-dispatches-per-second=1 --max-concurrent-dispatches=10
Viewing the logs
View logs in Cloud Workflows
Logs of workflow executions can be found in the Log
tab with in each workflow:
View logs in Cloud Functions
Logs of function executions can be found in the Log
tab with in each function:
View logs in Cloud Google Tasks
Logs of function executions can be found in the Log
tab with in each queue.
However, by default, logging is disabled. You need to enable logging if you want
to save logs for the task queue. After enabling it, logs of further tasks are
shown in the same tab.