The bulk upload feature allows to upload multiple files at time.
Prerequisites
- Grant the Document Warehouse UI service account permissions to list, read, and write objects from the bucket you want to import from
- Enable the Workflows Execution API
How to use it
To use the bulk upload feature you need to add the experiment flag to true into url params
https://documentwarehouse.cloud.google.com/?experiment=true
After adding the flag you are able to see "Bulk Upload" button. If you want to trigger bulk upload into a Document AI Warehouse folder you need to navigate to folder before click the "Bulk Upload" button.
The "Bulk Upload" button sends you to the formulary to configure the bulk upload options.
You need to select the document schema
Type valid path to the bucket that you want to ingest in format
gs://<bucket-name>/<folder>/<subfolder>
When the bucket path is valid, you get a list of the files into the bucket to be ingested.
You can click "upload" with those inputs only if you don't want to use any Document AI processor.
If you toggle the parse document with a Document AI processor, you need to specify the Document AI processor ID and a valid path to an output bucket to save the json generated for the processor.
Click on "upload" redirects you to the bulk upload status page where you could see the current process of the files.
While bulk upload is in progress, you could see the upload icon in top right of the screen.
Global process percentage is computed once all pages on the bucket are checked, percentage is based on complete files (ingested or failed) divided in the total files to ingest. UI updates status each 10 seconds requesting for a page, each page contains 100 documents including folders. File queue table can track 3 pages with pending files, after one page has all files completed then adds another page to the tracking
Example
We ingest a folder with 208 files.
Getting the first page that contains 99 files in progress, the files are added to the file queue table and the bulk upload icon is on the top bar.
Getting second page that contains 100 files in progress
Getting the last page that contains 9 more files in progress. Because we check the last page we have available the calculated complete percentage %
We check again the first page that has 13 files ingested and 86 still in progress
After repeated updates we have completed the ingestion process, then the bulk upload icon is gone.