What's new

HITL for Form Parser

  • HITL now supports Form Parser. Users can review and correct the key-value pairs extracted by Form Parser, and enable HITL on Form Parser processor in DocAI platform and configure the key names (as shown in screenshot below) they would like to filter for HITL review. The HITL output is saved to JSON files in the customer-specified Google Cloud Storage bucket after HITL review is completed.
  • UI Screenshots to configure HITL on Form Parsers HITL Form Parser
  • UI to configure key-level validation Key-level Validation
  • Labeler UI Form Parser Labler UI

Audit/QA Pipeline

  • HITL now enables a 2nd stage QA or audit stage, and reports the accuracy of the Review tasks (and the Labelers). A QA team or auditor can be assigned as an "expert Labeler" to a task. The QA team/Auditor will receive X percent (say 1%-100%, this is configurable by the customer) of the reviewed documents. The Auditor can correct the Reviewer's output. The system tracks the corrections and assigns an Accuracy score (e.g. 90%) to each audited document. The aggregate accuracy score of a task or labeler is reported in the Task and Labeler Analytics dashboards respectively. Here are detailed instructions on configuring an Audit pipeline.
  • Designating an Auditor Designating an auditor

  • Reporting accuracy Reporting accuracy

Lending AI Parsers (July 31)

  • HITL is now supported on some Lending AI parsers including 1040, 1040 Schedule E, 1040 schedule C, 1099 DIV, 1099 G, 1099 INT, 1099 MISC, Paystubs, Banks Statements, W2, W9, 1120, 1120S, 1065, SSA-1099, 1099 NEC, 1099-R

Standard vs Fast Track Queues (July 2)

  • We now support 2 priority queues (vs 1 queue) for each processor, based on the urgency of each document.
  • Submission - After prediction, the extracted documents can be evaluated for urgency and submitted to 2 queues (Standard vs Urgent/Fast-track) based on urgency of the document. For example, invoices with urgent due-dates can be submitted to the Fast-track queue. The logic that evaluates the urgency can be entered through a custom function.
  • Task Assignment - The labeling manager sees 2 different queues with different priorities, as shown in the screenshot below, and may assign the same group of labelers to both queues.
  • Task Prioritization - Labelers assigned to both tasks will always process any pending documents in the Fast-track queue first before processing the Standard queue (i.e. the queue prioritization is automatically handled by the system)
  • API call - Set the priority field in the ReviewDocument
  • UI screenshot (of tasks in Labeling Manager UI) UI Screenshot

Validation Filters for HITL End-point (June 24th)

  • The validation filters (configured in the processor) that filter the fields by confidence score to determine documents to be queued for human review, are now also applied to documents submitted to the HITL end-point.
  • When calling the ReviewDocument API, set the enable_schema_validation field to true. Note that if this is set, and validation decides the document doesn't need to trigger human review, an CANCELLED error will be returned.

Cancel API

  • You can cancel a doc enqueued for HITL processing by invoking the Cancel API for a given operation ID. [An operation ID is returned for each document submitted to HITL]

         `POST https://[us|eu]-documentai.googleapis.com/{api_version}/{name=projects/*/operations/*}:cancel`

Invoice Type (Classification Review)

  • The Labeler Workbench supports reviewing Invoice Type classification. Invoice Type Classification

Time-in-queue (HITL Latency SLO) Report

  • A report shows how many documents are enqueued for >18 hours and >24 hours. This is useful for users that need to manage an SLO expectation on HITL latency. Time-in-Queue Report

Known URL for Labeler Workbench

  • Labelers that are assigned to a single pool can now access the workbench at a known URL https://datacompute.corp.google.com/w/. This is useful in case you lose the email with the URL that was sent by the system or Labeling Manager. This URL doesn't work for labelers assigned to multiple pools.

Sticky Zoom Setting

  • The plug-in now remembers a labeler's Zoom setting (full-width vs full-page) for the next document reviews in the queue, so that they don't need to Zoom in for every document.