Process documents

This quickstart shows you how to process documents (invoices) from a source bucket and store the processed document (JSON file) in a target bucket by using the batch processing capability of Document AI API.

Before you begin

Before you run this quickstart, make sure that you or your administrators have completed the following prerequisites:

  • Make sure the Document AI API in enabled in your Google Cloud project.

    Go to API library

  • In the Document AI Workbench, create a processor with type INVOICE_PROCESSOR. For more information, see Creating and managing processors.

  • In Cloud Storage, create a source bucket to store the invoices for processing and place the invoices in this bucket. For more information, see Create buckets.

  • In Cloud Storage, create a target bucket to store the processed files.

Create a program to process documents

  1. In the SAP system, create an executable program in your custom namespace (for example, Z or Y) by using transaction SE38.

    1. In the SAP GUI, enter transaction code SE38.

    2. In the Program field, enter a name of your program, for example, ZDEMO_DOCUMENT_AI.

    3. Click Create.

    4. Specify the program attributes:

      1. In the Title field, enter a title of your program, for example, Process invoices.

      2. In the Type field, choose Executable Program.

      3. Click Save.

    5. Save the program as a Local Object.

    6. In the ABAP Editor, add the following code:

      *  Copyright 2023 Google LLC                                         *
      *                                                                    *
      *  Licensed under the Apache License, Version 2.0 (the "License");   *
      *  you may not use this file except in compliance with the License.  *
      *  You may obtain a copy of the License at                           *
      *                   *
      *  Unless required by applicable law or agreed to in writing,        *
      *  software distributed under the License is distributed on an       *
      *  either express or implied.                                        *
      *  See the License for the specific language governing permissions   *
      *  and limitations under the License.                                *
      REPORT zr_qs_process_documents.
      * data declarations
        lv_p_projects_id   TYPE string,
        lv_p_locations_id  TYPE string,
        lv_p_processors_id TYPE string,
        ls_input           TYPE /goog/cl_documentai_v1=>ty_017.
      * open http connection
          DATA(lo_client) = NEW /goog/cl_documentai_v1( iv_key_name = 'DEMO_DOC_PROCESSING' ).
      * populate relevant parameters
          lv_p_projects_id  = 'PROJECT_ID'.
          lv_p_locations_id = 'LOCATION_ID'.
          lv_p_processors_id = 'PROCESSOR_ID'.
          ls_input-input_documents-gcs_prefix-gcs_uri_prefix = 'SOURCE_BUCKET_URI'.
          ls_input-document_output_config-gcs_output_config-gcs_uri = 'TARGET_BUCKET_URI'.
      * call api method
          CALL METHOD lo_client->batch_process_processors
              iv_p_projects_id   = lv_p_projects_id
              iv_p_locations_id  = lv_p_locations_id
              iv_p_processors_id = lv_p_processors_id
              is_input           = ls_input
              es_output          = DATA(ls_output)
              ev_ret_code        = DATA(lv_ret_code)
              ev_err_text        = DATA(lv_err_text)
              es_err_resp        = DATA(ls_err_resp).
          IF lo_client->is_success( lv_ret_code ).
            MESSAGE 'Success' TYPE 'S'.
            MESSAGE lv_err_text TYPE 'E'.
      * close http connection
          lo_client->close( ).
        CATCH /goog/cx_sdk INTO DATA(lo_exception).
          MESSAGE lo_exception->get_text( ) TYPE 'E'.

      Replace the following:

      • DEMO_DOC_PROCESSING: the client key name.
      • PROJECT_ID: the ID of the Google Cloud project.
      • LOCATION_ID: the processor's location.
      • PROCESSOR_ID: the ID of the processor.
      • SOURCE_BUCKET_URI: the URI of the Cloud Storage bucket folder where source documents are kept for processing.
      • TARGET_BUCKET_URI: the URI of the Cloud Storage bucket where the processed document (JSON file) would be stored.
  2. Run your application in SE38.

  3. To validate the results, follow these steps:

    1. In the Google Cloud console, go to Cloud Storage Buckets page.

    2. Open the target bucket. The processed document is stored in the form of a JSON file.

What's next