Concepts

Following are some concepts and features used in this product:

Concept Definition
Review The process of visually comparing the extracted field values against actual values in the document and correcting any incorrect extractions, or adding missed extracted fields missed by the DocAI processors.
Labeler The human that reviews the extracted document. The customer can use their own workforce (Bring-your-own-labeler or BYOL) or use Google labelers for HITL Review.
Task A queue of extracted documents that labelers review. A processor generates a single task when configured for HITL Review.
Labeler Workbench The UI used by a Labeler to review documents. The UI presents documents from the queue, that the labeler can review, correct and either submit or reject.
  • BYOL labelers need to have a Google Workforce or Gmail account to access the labeling UI.
  • Labelers can access the Workbench through a link sent via email from the Labeling Manager upon task assignment.
Answer Time This is the time taken by a labeler to process a document. The Labeler Workbench tracks document submission time and presents efficiency analytics (e.g. for each labeler document review).
Labeling Manager One or more labeling managers are assigned to a pool of labelers, so that they can:
  • Add or remove labelers to labeler pools.
  • Assign or unassign tasks to a labeler. All tasks in the project are accessible to a labeler manager. They may change task assignments to labelers based on the changing priorities of tasks.
  • Pause tasks so that labelers can work on the next tasks assigned to them.
In the BYOL scenario, Labeling Managers are provided by the customer. When Google labelers are used, Google provides the Labeling Manager.
Labeling Manager Console UI used by a Labeling Manager to manage labeler pools and task assignments. Open console.
Enqueue, Answered, Completed, Rejected Documents in a Task A task is a continual workflow. A document goes through the following states:
  • Enqueued - As documents are processed by the processor, they're enqueued (added) to the HITL task.
  • Answered - when a document is reviewed, corrected and submitted by a Labeler, it is completed and saved in the customer's configured Cloud Storage bucket.
  • Completed - when a document is answered by all Labelers if the task has replication activated (multiple labelers working on each document in the task). When the task has no replication ( reviewed by a single labeler), Answered is the same as Completed.
  • Rejected - a document may be rejected if it is an invalid document (different doc-type, forged, etc) or poor quality (glare, edge cut off, etc).
Single Task per Processor We do not support multiple tasks per processor. If customers need to process a single document type (invoices, for example) in different tasks, they can configure multiple processors with HITL Review.
Task Assignment vs Labeler Pools Labeling Manager adds labelers to a pool. Once added, any labelers from the pool can be assigned to a task. Note, "Labeler pool" is not to be confused with the "group" of labelers assigned to a task. A Pool is managed at a Project level and is used to determine labeler access to the analytics and the tasks. Any labeler from the pool can be assigned to one of more tasks in the Project.
Labeler Pool A pool of labelers is created at a project level and not to be confused with task assignments. The Labeling Manager can assign any Any labeler Any labeler assigned to a task, so that multiple labelers can review documents in parallel and complete the task quicker. A labeler pool can be assigned to any task in the project by the customer.
Validation filters and thresholds Extracted fields have a confidence score (0-100) representing the confidence that the DocAI extraction is accurate. Customers can configure the validation threshold for each field, so that only pages with fields that are below this validation threshold are enqueued for review, ; fields above the threshold are not enqueued. There are 3 types of validation filters customers can configure:
  • Field-level filter - select the important fields that need to be reviewed and specify a confidence threshold for each field. If this threshold is set at 100% for any field, all pages containing this field are sent for review.
  • Document-level filter - select an overall document-level confidence threshold. If any field is below the threshold, the entire page is sent for review. If this threshold is set at 100%, all documents predicted are sent for review.
  • No filter - every document posted to the HITL end-point is sent for review.
Labeler Manager Analytics The Labeling Manager gets analytics for each Task and each Labeler, including Enqueued, Answered, Skipped, Completed, Average Handling Time/document and total Answer time. Analytics are accessed in the Analytics tab of the Labeling Manager Console.