Parsing invoices

You can convert invoices into structured data in Cloud Data Fusion using the Invoice Parser plugin, which is powered by Document AI. The structured data gets stored in BigQuery.

Before you begin

To parse invoices, you need a Cloud Data Fusion instance running in version 6.4.1 or later. For more information, see Upgrading Cloud Data Fusion instances.

Create a processor

  1. In the Google Cloud Console, go to the Document AI Processors page.

    Go to Processors

  2. Create a processor. Select Invoice parser as the type of processor.

    Select Invoice Parser as type of processor

Configure the invoice parser plugin

  1. In the Google Cloud Console, go to the Cloud Data Fusion Instances page.

    Go to Instances

  2. Ensure that the desired instance has been upgraded to version 6.4.1 or later. For earlier versions, upgrade the instance.

  3. Click View instance. The Cloud Data Fusion UI opens.

  4. Click Hub.

  5. Click GCP, and then deploy GCP Plugins.

  6. Click DocAI, and then deploy the Doc AI Plugins.

  7. Click the Invoice Parser Quickstart > Create.

  8. Customize your pipeline by entering the Invoice Parser processor ID, Cloud Storage bucket path, and BigQuery table details.

  9. Deploy and run the pipeline.

    Example pipeline with Invoice Parser plugin

Parsed invoices are stored in the output table in BigQuery. Metadata from the invoices is stored in the Metadata table and includes parsing status, Cloud Storage path, and upload timestamp of the raw invoice. Records in the output and metadata tables can be joined with the invoice_uuid key.