管理长时间运行的操作 (LRO)

批量处理方法调用会返回长时间运行的操作,因为它们的完成时间长于 API 响应所需的时间。这样,在处理多个文档时,发起调用的线程就不会保持打开状态。Document AI API 会在您每次通过 API 或客户端库调用 projects.locations.processors.batchProcess 时创建一个 LRO。LRO 会跟踪处理作业的状态。

您可以使用 Document AI API 提供的操作方法来检查 LRO 的状态。您还可以对 LRO 执行 listpollcancel 操作。调用异步方法的客户端库会在内部轮询,从而启用回调。(对于 Python,await 处于启用状态。)它们还具有超时参数。在 .batchProcess 返回的主要 LRO 中,系统会为每个文档创建一个 LRO(因为批量页面数量限制远高于同步 process 调用,并且处理可能需要很长时间)。主 LRO 结束后,系统会提供每个文档 LRO 的详细状态。

LRO 可在 Google Cloud 项目和位置级层进行管理。向该 API 发出请求时,请在其中提供 Google Cloud 项目以及 LRO 的运行位置。

在 LR 完成之后,LRO 的记录会保留大约 30 天,也就是说,在这段时间之后,您将无法再查看或列出 LRO。

获取长时间运行的操作的详细信息

以下示例展示了如何获取 LRO 的详细信息。

REST

如需获取 LRO 的状态并查看其详情,请调用 projects.locations.operations.get 方法。

假设您在调用 projects.locations.processors.batchProcess 后收到以下响应:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID"
}

响应中的 name 值表明 Document AI API 创建了一个名为 projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID 的 LRO。

您还可以通过列出长时间运行的操作来检索 LRO 名称。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目 ID。
  • LOCATION:LRO 的运行位置,例如:
    • us - 美国
    • eu - 欧盟
  • OPERATION_ID:您的操作的 ID。此 ID 是操作名称的最后一个元素。例如:
    • 操作名称:projects/PROJECT_ID/locations/LOCATION/operations/bc4e1d412863e626
    • 操作 ID:bc4e1d412863e626

HTTP 方法和网址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessMetadata",
    "state": "SUCCEEDED",
    "stateMessage": "Processed 1 document(s) successfully",
    "createTime": "TIMESTAMP",
    "updateTime": "TIMESTAMP",
    "individualProcessStatuses": [
      {
        "inputGcsSource": "INPUT_BUCKET_FOLDER/DOCUMENT1.ext",
        "status": {},
        "outputGcsDestination": "OUTPUT_BUCKET_FOLDER/OPERATION_ID/0",
        "humanReviewStatus": {
          "state": "ERROR",
          "stateMessage": "Sharded document protos are not supported for human review."
        }
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessResponse"
  }
}

Go

如需了解详情,请参阅 Document AI Go API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.GetOperationRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#GetOperationRequest.
	}
	resp, err := c.GetOperation(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Python

如需了解详情,请参阅 Document AI Python API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import GetOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def get_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = GetOperationRequest(name=operation_name)

    # Make GetOperation request
    operation = client.get_operation(request=request)
    # Print the Operation Information
    print(operation)

列出长时间运行的操作

以下示例展示了如何列出某个 Google Cloud 项目和位置的 LRO。

REST

如需列出某个 Google Cloud 项目和位置的 LRO,请调用 projects.locations.operations.list 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目 ID。
  • LOCATION:一个或多个 LRO 的运行位置,例如:
    • us - 美国
    • eu - 欧盟
  • FILTER:(必需)用于查询要返回的 LRO 的查询。选项:
    • TYPE:(必需)要列出的 LRO 类型。选项:
      • BATCH_PROCESS_DOCUMENTS
      • CREATE_PROCESSOR_VERSION
      • DELETE_PROCESSOR
      • ENABLE_PROCESSOR
      • DISABLE_PROCESSOR
      • UPDATE_HUMAN_REVIEW_CONFIG
      • HUMAN_REVIEW_EVENT
      • CREATE_LABELER_POOL
      • UPDATE_LABELER_POOL
      • DELETE_LABELER_POOL
      • DEPLOY_PROCESSOR_VERSION
      • UNDEPLOY_PROCESSOR_VERSION
      • DELETE_PROCESSOR_VERSION
      • SET_DEFAULT_PROCESSOR_VERSION
      • EVALUATE_PROCESSOR_VERSION
      • EXPORT_PROCESSOR_VERSION
      • UPDATE_DATASET
      • IMPORT_DOCUMENTS
      • ANALYZE_HITL_DATA
      • BATCH_MOVE_DOCUMENTS
      • RESYNC_DATASET
      • BATCH_DELETE_DOCUMENTS
      • DELETE_DATA_LABELING_JOB
      • EXPORT_DOCUMENTS

HTTP 方法和网址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations?filter=TYPE=TYPE" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "operations": [
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessMetadata",
        "state": "SUCCEEDED",
        "stateMessage": "Processed 1 document(s) successfully",
        "createTime": "TIMESTAMP",
        "updateTime": "TIMESTAMP",
        "individualProcessStatuses": [
          {
            "inputGcsSource": "INPUT_BUCKET_FOLDER/DOCUMENT1.ext",
            "status": {},
            "outputGcsDestination": "OUTPUT_BUCKET_FOLDER/OPERATION_ID/0",
            "humanReviewStatus": {
              "state": "ERROR",
              "stateMessage": "Sharded document protos are not supported for human review."
            }
          }
        ]
      },
      "done": true,
      "response": {
        "@type": "type.googleapis.com/google.cloud.documentai.v1.BatchProcessResponse"
      }
    },
    ...
  ]
}

Go

如需了解详情,请参阅 Document AI Go API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
	"google.golang.org/api/iterator"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.ListOperationsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#ListOperationsRequest.
	}
	it := c.ListOperations(ctx, req)
	for {
		resp, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			// TODO: Handle error.
		}
		// TODO: Use resp.
		_ = resp

		// If you need to access the underlying RPC response,
		// you can do so by casting the `Response` as below.
		// Otherwise, remove this line. Only populated after
		// first call to Next(). Not safe for concurrent access.
		_ = it.Response.(*longrunningpb.ListOperationsResponse)
	}
}

Python

如需了解详情,请参阅 Document AI Python API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import ListOperationsRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"

# Create filter in https://google.aip.dev/160 syntax
# For full options, refer to:
# https://cloud.google.com/document-ai/docs/long-running-operations#listing_long-running_operations
# operations_filter = 'YOUR_FILTER'

# Example:
# operations_filter = "TYPE=BATCH_PROCESS_DOCUMENTS AND STATE=RUNNING"


def list_operations_sample(
    project_id: str, location: str, operations_filter: str
) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # Format: `projects/{project_id}/locations/{location}`
    name = client.common_location_path(project=project_id, location=location)
    request = ListOperationsRequest(
        name=f"{name}/operations",
        filter=operations_filter,
    )

    # Make ListOperations request
    operations = client.list_operations(request=request)

    # Print the Operation Information
    print(operations)

轮询长时间运行的操作

以下示例展示了如何轮询 LRO 的状态。

REST

如要轮询 LRO,请重复调用 projects.locations.operations.get 方法,直到操作完成。在每个轮询请求之间使用退避时间,例如 10 秒。

在使用下面的请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目 ID
  • LOCATION:LRO 的运行位置
  • OPERATION_ID:LRO 的标识符

HTTP 方法和网址:

GET https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID
  

如需发送请求,请选择以下方式之一:

curl

执行以下命令,每隔 10 秒轮询一次 LRO 的状态:

while true; \
    do curl -X GET \
    -H "Authorization: Bearer "$(gcloud auth print-access-token) \
    "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"; \
    sleep 10; \
    done

您应该每 10 秒收到一次 JSON 响应。 操作运行期间,响应将包含 "state": "RUNNING"。操作完成后,响应中将包含 "state": "SUCCEEDED""done": true

PowerShell

执行以下命令,每隔 10 秒轮询一次 LRO 的状态:

$cred = gcloud auth print-access-token
$headers = @{ Authorization = "Bearer $cred" }

Do {
  Invoke-WebRequest `
    -Method Get `
    -Headers $headers `
    -Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
sleep 10
}

while ($true)

您应该每 10 秒收到一次 JSON 响应。 操作运行期间,响应将包含 "state": "RUNNING"。操作完成后,响应中将包含 "state": "SUCCEEDED""done": true

Python

如需了解详情,请参阅 Document AI Python API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from time import sleep

from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import GetOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def poll_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = GetOperationRequest(name=operation_name)

    while True:
        # Make GetOperation request
        operation = client.get_operation(request=request)
        # Print the Operation Information
        print(operation)

        # Stop polling when Operation is no longer running
        if operation.done:
            break

        # Wait 10 seconds before polling again
        sleep(10)

取消长时间运行的操作

以下示例展示了如何在运行时取消 LRO。

REST

要取消 LRO,请调用 projects.locations.operations.cancel 方法。

在使用任何请求数据之前,请先进行以下替换:

  • PROJECT_ID:您的 Google Cloud 项目 ID。
  • LOCATION:LRO 的运行位置,例如:
    • us - 美国
    • eu - 欧盟
  • OPERATION_ID:您的操作的 ID。此 ID 是操作名称的最后一个元素。例如:
    • 操作名称:projects/PROJECT_ID/locations/LOCATION/operations/bc4e1d412863e626
    • 操作 ID:bc4e1d412863e626

HTTP 方法和网址:

POST https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel"

PowerShell

执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://LOCATION-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID:cancel" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{}
如果您尝试取消已完成的操作,则应该会收到以下错误消息:
  "error": {
      "code": 400,
      "message": "Operation has completed and cannot be cancelled: 'PROJECT_ID/locations/LOCATION/operations/OPERATION_ID'.",
      "status": "FAILED_PRECONDITION"
  }

Go

如需了解详情,请参阅 Document AI Go API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


package main

import (
	"context"

	documentai "cloud.google.com/go/documentai/apiv1"
	longrunningpb "cloud.google.com/go/longrunning/autogen/longrunningpb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := documentai.NewDocumentProcessorClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &longrunningpb.CancelOperationRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/longrunning/autogen/longrunningpb#CancelOperationRequest.
	}
	err = c.CancelOperation(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}
}

Python

如需了解详情,请参阅 Document AI Python API 参考文档

如需向 Document AI 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore
from google.longrunning.operations_pb2 import CancelOperationRequest  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
# location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
# operation_name = "YOUR_OPERATION_NAME"  # Format is "projects/{project_id}/locations/{location}/operations/{operation_id}"


def cancel_operation_sample(location: str, operation_name: str) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    request = CancelOperationRequest(name=operation_name)

    # Make CancelOperation request
    client.cancel_operation(request=request)
    print(f"Operation {operation_name} cancelled")