Cloud Document AI API Connector Overview

The Workflows connector defines the built-in functions that can be used to access other Google Cloud products within a workflow.

This page provides an overview of the individual connector. There is no need to import or load connector libraries in a workflow—connectors work out of the box when used in a call step.

Cloud Document AI API

Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, translation, and AutoML. To learn more, see the Cloud Document AI API documentation.

Cloud Document AI connector sample

YAML

# This workflow demonstrates how to use the the process and batchProcess
# APIs in the Cloud Document AI connector.
# Expected successful output: the batch process response.

- process_document:
    call: googleapis.documentai.v1.projects.locations.processors.process
    args:
      name: "projects/placeholder/locations/us/processors/placeholder"
      location: "us"
      body:
        rawDocument:
          # Procedure to create some test raw content:
          # 1. Create a docx with some arbitrary texts in it. For example, "hello world".
          # 2. Export a pdf file from Microsoft Word.
          # 3. Use any online pdf-to-raw converter to convert the file to raw base64 texts. (https://pdfmall.com/pdf-to-raw).
          # 4. Copy and paste the content here.
          content: ""
          mimeType: "application/pdf"
    result: process_resp
- batch_process:
    call: googleapis.documentai.v1.projects.locations.processors.batchProcess
    args:
      name: "projects/cloudworkflows-test-dev/locations/us/processors/583f73e6003945cc"
      location: "us"
      body:
        inputDocuments:
          gcsDocuments:
            documents:
              - gcsUri: "gs://connector-demo/documents/helloworld1.pdf"
                mimeType: "application/pdf"
              - gcsUri: "gs://connector-demo/documents/helloworld2.pdf"
                mimeType: "application/pdf"
        documentOutputConfig:
          gcsOutputConfig:
            gcsUri: "gs://connector-demo/documents/"
    result: batch_process_resp
- return:
    return: ${batch_process_resp}

JSON

[
  {
    "process_document": {
      "call": "googleapis.documentai.v1.projects.locations.processors.process",
      "args": {
        "name": "projects/placeholder/locations/us/processors/placeholder",
        "location": "us",
        "body": {
          "rawDocument": {
            "content": "",
            "mimeType": "application/pdf"
          }
        }
      },
      "result": "process_resp"
    }
  },
  {
    "batch_process": {
      "call": "googleapis.documentai.v1.projects.locations.processors.batchProcess",
      "args": {
        "name": "projects/cloudworkflows-test-dev/locations/us/processors/583f73e6003945cc",
        "location": "us",
        "body": {
          "inputDocuments": {
            "gcsDocuments": {
              "documents": [
                {
                  "gcsUri": "gs://connector-demo/documents/helloworld1.pdf",
                  "mimeType": "application/pdf"
                },
                {
                  "gcsUri": "gs://connector-demo/documents/helloworld2.pdf",
                  "mimeType": "application/pdf"
                }
              ]
            }
          },
          "documentOutputConfig": {
            "gcsOutputConfig": {
              "gcsUri": "gs://connector-demo/documents/"
            }
          }
        }
      },
      "result": "batch_process_resp"
    }
  },
  {
    "return": {
      "return": "${batch_process_resp}"
    }
  }
]

Module: googleapis.documentai.v1.projects.locations

Functions
fetchProcessorTypes Fetches processor types. Note that we do not use ListProcessorTypes here because it is not paginated.
get Gets information about a location.
list Lists information about the supported locations for this service.

Module: googleapis.documentai.v1.projects.locations.operations

Functions
cancel Starts asynchronous cancellation on a long-running operation. The server makes a best effort to cancel the operation, but success is not guaranteed. If the server doesn't support this method, it returns google.rpc.Code.UNIMPLEMENTED. Clients can use Operations.GetOperation or other methods to check whether the cancellation succeeded or whether the operation completed despite cancellation. On successful cancellation, the operation is not deleted; instead, it becomes an operation with an Operation.error value with a google.rpc.Status.code of 1, corresponding to Code.CANCELLED.
get Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
list Lists operations that match the specified filter in the request. If the server doesn't support this method, it returns UNIMPLEMENTED. NOTE: the name binding allows API services to override the binding to use different resource name schemes, such as users/*/operations. To override the binding, API services can add a binding such as "/v1/{name=users/*}/operations" to their service configuration. For backwards compatibility, the default name includes the operations collection id, however overriding users must ensure the name binding is the parent resource, without the operations collection id.

Module: googleapis.documentai.v1.projects.locations.processorTypes

Functions
get Gets a processor type detail.
list Lists the processor types that exist.

Module: googleapis.documentai.v1.projects.locations.processors

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
create Creates a processor from the type processor that the user chose. The processor will be at "ENABLED" state by default after its creation.
delete Deletes the processor, unloads all deployed model artifacts if it was enabled and then deletes all artifacts associated with this processor.
disable Disables a processor
enable Enables a processor
get Gets a processor detail.
list Lists all processors which belong to this project.
process Processes a single document.
setDefaultProcessorVersion Set the default (active) version of a Processor that will be used in ProcessDocument and BatchProcessDocuments.

Module: googleapis.documentai.v1.projects.locations.processors.humanReviewConfig

Functions
reviewDocument Send a document for Human Review. The input document should be processed by the specified processor.

Module: googleapis.documentai.v1.projects.locations.processors.processorVersions

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
delete Deletes the processor version, all artifacts under the processor version will be deleted.
deploy Deploys the processor version.
get Gets a processor version detail.
list Lists all versions of a processor.
process Processes a single document.
undeploy Undeploys the processor version.

Module: googleapis.documentai.v1.projects.operations

Functions
get Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

Module: googleapis.documentai.v1beta2.projects.documents

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
process Processes a single document.

Module: googleapis.documentai.v1beta2.projects.locations.documents

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
process Processes a single document.

Module: googleapis.documentai.v1beta2.projects.locations.operations

Functions
get Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

Module: googleapis.documentai.v1beta2.projects.operations

Functions
get Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

Module: googleapis.documentai.v1beta3.projects.locations

Functions
fetchProcessorTypes Fetches processor types. Note that we do not use ListProcessorTypes here because it is not paginated.
get Gets information about a location.
list Lists information about the supported locations for this service.

Module: googleapis.documentai.v1beta3.projects.locations.operations

Functions
cancel Starts asynchronous cancellation on a long-running operation. The server makes a best effort to cancel the operation, but success is not guaranteed. If the server doesn't support this method, it returns google.rpc.Code.UNIMPLEMENTED. Clients can use Operations.GetOperation or other methods to check whether the cancellation succeeded or whether the operation completed despite cancellation. On successful cancellation, the operation is not deleted; instead, it becomes an operation with an Operation.error value with a google.rpc.Status.code of 1, corresponding to Code.CANCELLED.
get Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
list Lists operations that match the specified filter in the request. If the server doesn't support this method, it returns UNIMPLEMENTED. NOTE: the name binding allows API services to override the binding to use different resource name schemes, such as users/*/operations. To override the binding, API services can add a binding such as "/v1/{name=users/*}/operations" to their service configuration. For backwards compatibility, the default name includes the operations collection id, however overriding users must ensure the name binding is the parent resource, without the operations collection id.

Module: googleapis.documentai.v1beta3.projects.locations.processorTypes

Functions
get Gets a processor type detail.
list Lists the processor types that exist.

Module: googleapis.documentai.v1beta3.projects.locations.processors

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
create Creates a processor from the type processor that the user chose. The processor will be at "ENABLED" state by default after its creation.
delete Deletes the processor, unloads all deployed model artifacts if it was enabled and then deletes all artifacts associated with this processor.
disable Disables a processor
enable Enables a processor
get Gets a processor detail.
list Lists all processors which belong to this project.
process Processes a single document.
setDefaultProcessorVersion Set the default (active) version of a Processor that will be used in ProcessDocument and BatchProcessDocuments.

Module: googleapis.documentai.v1beta3.projects.locations.processors.humanReviewConfig

Functions
reviewDocument Send a document for Human Review. The input document should be processed by the specified processor.

Module: googleapis.documentai.v1beta3.projects.locations.processors.processorVersions

Functions
batchProcess LRO endpoint to batch process many documents. The output is written to Cloud Storage as JSON in the [Document] format.
delete Deletes the processor version, all artifacts under the processor version will be deleted.
deploy Deploys the processor version.
evaluateProcessorVersion Evaluates a ProcessorVersion against annotated documents, producing an Evaluation.
get Gets a processor version detail.
list Lists all versions of a processor.
process Processes a single document.
train Trains a new processor version. Operation metadata is returned as cloud_documentai_core.TrainProcessorVersionMetadata.
undeploy Undeploys the processor version.

Module: googleapis.documentai.v1beta3.projects.locations.processors.processorVersions.evaluations

Functions
get Retrieves a specific evaluation.
list Retrieves a set of evaluations for a given processor version.