检测文件中的文本 (PDF/TIFF)

Vision API 可以检测并转录 Cloud Storage 中存储的 PDF 和 TIFF 文件中的文本。

必须使用 files:asyncBatchAnnotate 函数请求对 PDF 和 TIFF 执行文档文本检测;该函数将执行离线(异步)请求并通过 operations 资源提供其状态。

PDF/TIFF 请求的输出会写入在指定的 Cloud Storage 存储桶中创建的 JSON 文件。

限制

Vision API 可接受最多 2,000 页的 PDF/TIFF 文件。如果文件包含更多页面,则会返回错误。

身份验证

files:asyncBatchAnnotate 请求不支持 API 密钥。如需了解如何使用服务账号进行身份验证,请参阅使用服务账号

用于进行身份验证的账号必须能够访问您为输出指定的 Cloud Storage 存储桶(roles/editorroles/storage.objectCreator 或权限更高的角色)。

可以使用 API 密钥来查询该操作的状态;如需查看相关说明,请参阅使用 API 密钥

文档文本检测请求

目前,PDF/TIFF 文档检测仅适用于存储在 Cloud Storage 存储桶中的文件。响应 JSON 文件同样会保存到 Cloud Storage 存储桶。

2010 年美国人口普查 PDF 页面
gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf来源美国人口普查局

在使用任何请求数据之前,请先进行以下替换:

  • CLOUD_STORAGE_BUCKET:用于保存输出文件的 Cloud Storage 存储桶/目录,采用以下格式表示:
    • gs://bucket/directory/
    发出请求的用户必须具有相应存储桶的写入权限。
  • CLOUD_STORAGE_FILE_URI:Cloud Storage 存储桶中有效文件 (PDF/TIFF) 的路径。您必须至少拥有该文件的读取权限。 示例:
    • gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
  • FEATURE_TYPE:有效的特征类型。 对于 files:asyncBatchAnnotate 请求,您可以使用以下特征类型:
    • DOCUMENT_TEXT_DETECTION
    • TEXT_DETECTION
  • PROJECT_ID:您的 Google Cloud 项目 ID。

特定于字段的注意事项

  • inputConfig - 用于替换其他 Vision API 请求中所用的 image 字段。它包含两个子字段:
    • gcsSource.uri - PDF 或 TIFF 文件的 Google Cloud Storage URI(可供发出请求的用户或服务账号访问)。
    • mimeType - 接受的文件类型之一:application/pdfimage/tiff
  • outputConfig - 指定输出详细信息。它包含两个子字段:
    • gcsDestination.uri - 有效的 Google Cloud Storage URI。该存储桶必须可供发出请求的用户或服务账号写入。文件名为 output-x-to-y,其中 xy 表示包含在该输出文件中的 PDF/TIFF 页码。如果该文件已存在,其内容将被覆盖。
    • batchSize - 指定每个输出 JSON 文件中应包含多少页输出。

HTTP 方法和网址:

POST https://vision.googleapis.com/v1/files:asyncBatchAnnotate

请求 JSON 正文:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_FILE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

如需发送请求,请选择以下方式之一:

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://vision.googleapis.com/v1/files:asyncBatchAnnotate"

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://vision.googleapis.com/v1/files:asyncBatchAnnotate" | Select-Object -Expand Content
响应:

如果 asyncBatchAnnotate 请求成功,返回的响应中将包含单个名称字段:

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

此名称表示具有一个关联 ID(例如 1efec2285bd442df)的长时间运行的操作,您可以使用 v1.operations API 对其进行查询。

如需检索您的 Vision 注释响应,请向 v1.operations 端点发送 GET 请求,同时在网址中传递操作 ID:

GET https://vision.googleapis.com/v1/operations/operation-id

例如:

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

如果操作正在进行:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

操作完成后,state 会显示为 DONE,并且结果会写入您指定的 Google Cloud Storage 文件中:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

输出文件中的 JSON 类似于图片 [文档文本检测请求](/vision/docs/ocr) 的 JSON;不过,前者多了一个 context 字段,用于显示指定的 PDF 或 TIFF 位置以及相应文件中的页数:

output-1-to-1.json
        
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf"
        },
        "mimeType": "application/pdf"
      },
      "responses": [
        {
          "fullTextAnnotation": {
            "pages": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en",
                      "confidence": 0.94
                    }
                  ]
                },
                "width": 612,
                "height": 792,
                "blocks": [
                  {
                    "boundingBox": {
                      "normalizedVertices": [
                        {
                          "x": 0.12908497,
                          "y": 0.10479798
                        },
                        ...
                        {
                          "x": 0.12908497,
                          "y": 0.1199495
                        }
                      ]
                    },
                    "paragraphs": [
                      {
                      ...
                        },
                        "words": [
                          {
                            ...
                            },
                            "symbols": [
                              {
                              ...
                                "text": "C",
                                "confidence": 0.99
                              },
                              {
                                "property": {
                                  "detectedLanguages": [
                                    {
                                      "languageCode": "en"
                                    }
                                  ]
                                },
                                "text": "O",
                                "confidence": 0.99
                              },
                 ...
                 }
                ]
              }
            ],
            "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nList of Statistical Tables...
            \nHow to Use This Census Report ..\nTable Finding Guide .\nUser
            Notes .......\nStatistical Tables.........\nAppendixes
            \nA Geographic Terms and Concepts .........\nB Definitions of
            Subject Characteristics.\nData Collection and Processing Procedures...
            \nQuestionnaire. ........\nE Maps .................\nF Operational
            Overview and accuracy of the Data.......\nG Residence Rule and
            Residence Situations for the \n2010 Census of the United States...
            \nH Acknowledgments .....\nE\n*Appendix may be found in the separate
            volume, CPH-1-A, Summary Population and\nHousing Characteristics,
            Selected Appendixes, on the Internet at
            <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
          },
          "context": {
            "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf",
            "pageNumber": 1
          }
        }
      ]
    }
        

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Go 设置说明进行操作。 如需了解详情,请参阅 Vision Go API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


// detectAsyncDocumentURI performs Optical Character Recognition (OCR) on a
// PDF file stored in GCS.
func detectAsyncDocumentURI(w io.Writer, gcsSourceURI, gcsDestinationURI string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	request := &visionpb.AsyncBatchAnnotateFilesRequest{
		Requests: []*visionpb.AsyncAnnotateFileRequest{
			{
				Features: []*visionpb.Feature{
					{
						Type: visionpb.Feature_DOCUMENT_TEXT_DETECTION,
					},
				},
				InputConfig: &visionpb.InputConfig{
					GcsSource: &visionpb.GcsSource{Uri: gcsSourceURI},
					// Supported MimeTypes are: "application/pdf" and "image/tiff".
					MimeType: "application/pdf",
				},
				OutputConfig: &visionpb.OutputConfig{
					GcsDestination: &visionpb.GcsDestination{Uri: gcsDestinationURI},
					// How many pages should be grouped into each json output file.
					BatchSize: 2,
				},
			},
		},
	}

	operation, err := client.AsyncBatchAnnotateFiles(ctx, request)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "Waiting for the operation to finish.")

	resp, err := operation.Wait(ctx)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "%v", resp)

	return nil
}

在试用此示例之前,请按照Vision API 快速入门:使用客户端库中的 Java 设置说明进行操作。如需了解详情,请参阅 Vision API Java 参考文档

/**
 * Performs document text OCR with PDF/TIFF as source files on Google Cloud Storage.
 *
 * @param gcsSourcePath The path to the remote file on Google Cloud Storage to detect document
 *     text on.
 * @param gcsDestinationPath The path to the remote file on Google Cloud Storage to store the
 *     results on.
 * @throws Exception on errors while closing the client.
 */
public static void detectDocumentsGcs(String gcsSourcePath, String gcsDestinationPath)
    throws Exception {

  // Initialize client that will be used to send requests. This client only needs to be created
  // once, and can be reused for multiple requests. After completing all of your requests, call
  // the "close" method on the client to safely clean up any remaining background resources.
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    List<AsyncAnnotateFileRequest> requests = new ArrayList<>();

    // Set the GCS source path for the remote file.
    GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath).build();

    // Create the configuration with the specified MIME (Multipurpose Internet Mail Extensions)
    // types
    InputConfig inputConfig =
        InputConfig.newBuilder()
            .setMimeType(
                "application/pdf") // Supported MimeTypes: "application/pdf", "image/tiff"
            .setGcsSource(gcsSource)
            .build();

    // Set the GCS destination path for where to save the results.
    GcsDestination gcsDestination =
        GcsDestination.newBuilder().setUri(gcsDestinationPath).build();

    // Create the configuration for the System.output with the batch size.
    // The batch size sets how many pages should be grouped into each json System.output file.
    OutputConfig outputConfig =
        OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination).build();

    // Select the Feature required by the vision API
    Feature feature = Feature.newBuilder().setType(Feature.Type.DOCUMENT_TEXT_DETECTION).build();

    // Build the OCR request
    AsyncAnnotateFileRequest request =
        AsyncAnnotateFileRequest.newBuilder()
            .addFeatures(feature)
            .setInputConfig(inputConfig)
            .setOutputConfig(outputConfig)
            .build();

    requests.add(request);

    // Perform the OCR request
    OperationFuture<AsyncBatchAnnotateFilesResponse, OperationMetadata> response =
        client.asyncBatchAnnotateFilesAsync(requests);

    System.out.println("Waiting for the operation to finish.");

    // Wait for the request to finish. (The result is not used, since the API saves the result to
    // the specified location on GCS.)
    List<AsyncAnnotateFileResponse> result =
        response.get(180, TimeUnit.SECONDS).getResponsesList();

    // Once the request has completed and the System.output has been
    // written to GCS, we can list all the System.output files.
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // Get the destination location from the gcsDestinationPath
    Pattern pattern = Pattern.compile("gs://([^/]+)/(.+)");
    Matcher matcher = pattern.matcher(gcsDestinationPath);

    if (matcher.find()) {
      String bucketName = matcher.group(1);
      String prefix = matcher.group(2);

      // Get the list of objects with the given prefix from the GCS bucket
      Bucket bucket = storage.get(bucketName);
      com.google.api.gax.paging.Page<Blob> pageList = bucket.list(BlobListOption.prefix(prefix));

      Blob firstOutputFile = null;

      // List objects with the given prefix.
      System.out.println("Output files:");
      for (Blob blob : pageList.iterateAll()) {
        System.out.println(blob.getName());

        // Process the first System.output file from GCS.
        // Since we specified batch size = 2, the first response contains
        // the first two pages of the input file.
        if (firstOutputFile == null) {
          firstOutputFile = blob;
        }
      }

      // Get the contents of the file and convert the JSON contents to an AnnotateFileResponse
      // object. If the Blob is small read all its content in one request
      // (Note: the file is a .json file)
      // Storage guide: https://cloud.google.com/storage/docs/downloading-objects
      String jsonContents = new String(firstOutputFile.getContent());
      Builder builder = AnnotateFileResponse.newBuilder();
      JsonFormat.parser().merge(jsonContents, builder);

      // Build the AnnotateFileResponse object
      AnnotateFileResponse annotateFileResponse = builder.build();

      // Parse through the object to get the actual response for the first page of the input file.
      AnnotateImageResponse annotateImageResponse = annotateFileResponse.getResponses(0);

      // Here we print the full text from the first page.
      // The response contains more information:
      // annotation/pages/blocks/paragraphs/words/symbols
      // including confidence score and bounding boxes
      System.out.format("%nText: %s%n", annotateImageResponse.getFullTextAnnotation().getText());
    } else {
      System.out.println("No MATCH");
    }
  }
}

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Node.js 设置说明进行操作。 如需了解详情,请参阅 Vision Node.js API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Bucket where the file resides
// const bucketName = 'my-bucket';
// Path to PDF file within bucket
// const fileName = 'path/to/document.pdf';
// The folder to store the results
// const outputPrefix = 'results'

const gcsSourceUri = `gs://${bucketName}/${fileName}`;
const gcsDestinationUri = `gs://${bucketName}/${outputPrefix}/`;

const inputConfig = {
  // Supported mime_types are: 'application/pdf' and 'image/tiff'
  mimeType: 'application/pdf',
  gcsSource: {
    uri: gcsSourceUri,
  },
};
const outputConfig = {
  gcsDestination: {
    uri: gcsDestinationUri,
  },
};
const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
const request = {
  requests: [
    {
      inputConfig: inputConfig,
      features: features,
      outputConfig: outputConfig,
    },
  ],
};

const [operation] = await client.asyncBatchAnnotateFiles(request);
const [filesResponse] = await operation.promise();
const destinationUri =
  filesResponse.responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Python 设置说明进行操作。 如需了解详情,请参阅 Vision Python API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""
    import json
    import re
    from google.cloud import vision
    from google.cloud import storage

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 2

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size
    )

    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix, filtering out folders.
    blob_list = [
        blob
        for blob in list(bucket.list_blobs(prefix=prefix))
        if not blob.name.endswith("/")
    ]
    print("Output files:")
    for blob in blob_list:
        print(blob.name)

    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[0]

    json_string = output.download_as_bytes().decode("utf-8")
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print("Full text:\n")
    print(annotation["text"])

您使用的 gcloud 命令取决于文件类型。

  • 如需执行 PDF 文本检测,请使用 gcloud ml vision detect-text-pdf 命令,如以下示例所示:

    gcloud ml vision detect-text-pdf gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
    
  • 如需执行 TIFF 文本检测,请使用 gcloud ml vision detect-text-tiff 命令,如以下示例所示:

    gcloud ml vision detect-text-tiff gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
    

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 Vision 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 Vision 参考文档。

Ruby 版: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 Vision 参考文档。

多区域支持

现可指定洲级数据存储和 OCR 处理。目前支持以下区域:

  • us:仅限美国
  • eu:欧盟

位置

借助 Cloud Vision,您可以控制存储和处理项目资源的位置。具体来说,您可以将 Cloud Vision 配置为仅在欧盟地区存储和处理您的数据。

默认情况下,Cloud Vision 会在全球位置存储和处理资源,这意味着 Cloud Vision 不保证您的资源将保留在特定位置或区域内。如果您选择欧盟位置,Google 只会在欧盟地区存储和处理您的数据。您和您的用户可以从任意位置访问该数据。

使用 API 设置位置

Vision API 支持全球 API 端点 (vision.googleapis.com) 以及两个基于区域的端点:欧盟端点 (eu-vision.googleapis.com) 和美国端点 (us-vision.googleapis.com)。使用这些端点进行特定于区域的处理。例如,要仅在欧盟地区存储和处理数据,请使用 URI eu-vision.googleapis.com 代替 vision.googleapis.com 进行 REST API 调用:

  • https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:annotate
  • https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/images:asyncBatchAnnotate
  • https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:annotate
  • https://eu-vision.googleapis.com/v1/projects/PROJECT_ID/locations/eu/files:asyncBatchAnnotate

如需仅在美国存储和处理您的数据,请在上述方法中使用美国端点 (us-vision.googleapis.com)。

使用客户端库设置位置

默认情况下,Vision API 客户端库会访问全球 API 端点 (vision.googleapis.com)。如需仅在欧盟地区存储和处理您的数据,您需要明确设置端点 (eu-vision.googleapis.com)。以下代码示例展示了如何配置此设置。

在使用任何请求数据之前,请先进行以下替换:

  • REGION_ID:有效的区域位置标识符之一:
    • us:仅限美国
    • eu:欧盟
  • CLOUD_STORAGE_IMAGE_URI:Cloud Storage 存储桶中有效图片文件的路径。您必须至少拥有该文件的读取权限。 示例:
    • gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf
  • CLOUD_STORAGE_BUCKET:用于保存输出文件的 Cloud Storage 存储桶/目录,采用以下格式表示:
    • gs://bucket/directory/
    发出请求的用户必须具有相应存储桶的写入权限。
  • FEATURE_TYPE:有效的特征类型。 对于 files:asyncBatchAnnotate 请求,您可以使用以下特征类型:
    • DOCUMENT_TEXT_DETECTION
    • TEXT_DETECTION
  • PROJECT_ID:您的 Google Cloud 项目 ID。

特定于字段的注意事项

  • inputConfig - 用于替换其他 Vision API 请求中所用的 image 字段。它包含两个子字段:
    • gcsSource.uri - PDF 或 TIFF 文件的 Google Cloud Storage URI(可供发出请求的用户或服务账号访问)。
    • mimeType - 接受的文件类型之一:application/pdfimage/tiff
  • outputConfig - 指定输出详细信息。它包含两个子字段:
    • gcsDestination.uri - 有效的 Google Cloud Storage URI。该存储桶必须可供发出请求的用户或服务账号写入。文件名为 output-x-to-y,其中 xy 表示包含在该输出文件中的 PDF/TIFF 页码。如果该文件已存在,其内容将被覆盖。
    • batchSize - 指定每个输出 JSON 文件中应包含多少页输出。

HTTP 方法和网址:

POST https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate

请求 JSON 正文:

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "CLOUD_STORAGE_IMAGE_URI"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "FEATURE_TYPE"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "CLOUD_STORAGE_BUCKET"
        },
        "batchSize": 1
      }
    }
  ]
}

如需发送请求,请选择以下方式之一:

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate"

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION_ID-vision.googleapis.com/v1/projects/PROJECT_ID/locations/REGION_ID/files:asyncBatchAnnotate" | Select-Object -Expand Content
响应:

如果 asyncBatchAnnotate 请求成功,返回的响应中将包含单个名称字段:

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

此名称表示具有一个关联 ID(例如 1efec2285bd442df)的长时间运行的操作,您可以使用 v1.operations API 对其进行查询。

如需检索您的 Vision 注释响应,请向 v1.operations 端点发送 GET 请求,同时在网址中传递操作 ID:

GET https://vision.googleapis.com/v1/operations/operation-id

例如:

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/projects/project-id/locations/location-id/operations/1efec2285bd442df

如果操作正在进行:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

操作完成后,state 会显示为 DONE,并且结果会写入您指定的 Google Cloud Storage 文件中:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

如果使用了 DOCUMENT_TEXT_DETECTION 特征,则输出文件中的 JSON 类似于图片文档文本检测响应的 JSON;如果使用了 TEXT_DETECTION 特征,则类似于图片文本检测响应的 JSON。输出将包含一个额外的 context 字段,用于显示指定的 PDF 或 TIFF 的位置以及相应文件中的页数:

output-1-to-1.json
        
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf"
        },
        "mimeType": "application/pdf"
      },
      "responses": [
        {
          "fullTextAnnotation": {
            "pages": [
              {
                "property": {
                  "detectedLanguages": [
                    {
                      "languageCode": "en",
                      "confidence": 0.94
                    }
                  ]
                },
                "width": 612,
                "height": 792,
                "blocks": [
                  {
                    "boundingBox": {
                      "normalizedVertices": [
                        {
                          "x": 0.12908497,
                          "y": 0.10479798
                        },
                        ...
                        {
                          "x": 0.12908497,
                          "y": 0.1199495
                        }
                      ]
                    },
                    "paragraphs": [
                      {
                      ...
                        },
                        "words": [
                          {
                            ...
                            },
                            "symbols": [
                              {
                              ...
                                "text": "C",
                                "confidence": 0.99
                              },
                              {
                                "property": {
                                  "detectedLanguages": [
                                    {
                                      "languageCode": "en"
                                    }
                                  ]
                                },
                                "text": "O",
                                "confidence": 0.99
                              },
                 ...
                 }
                ]
              }
            ],
            "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nList of Statistical Tables...
            \nHow to Use This Census Report ..\nTable Finding Guide .\nUser
            Notes .......\nStatistical Tables.........\nAppendixes
            \nA Geographic Terms and Concepts .........\nB Definitions of
            Subject Characteristics.\nData Collection and Processing Procedures...
            \nQuestionnaire. ........\nE Maps .................\nF Operational
            Overview and accuracy of the Data.......\nG Residence Rule and
            Residence Situations for the \n2010 Census of the United States...
            \nH Acknowledgments .....\nE\n*Appendix may be found in the separate
            volume, CPH-1-A, Summary Population and\nHousing Characteristics,
            Selected Appendixes, on the Internet at
            <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
          },
          "context": {
            "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf",
            "pageNumber": 1
          }
        }
      ]
    }
        

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Go 设置说明进行操作。 如需了解详情,请参阅 Vision Go API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import (
	"context"
	"fmt"

	vision "cloud.google.com/go/vision/apiv1"
	"google.golang.org/api/option"
)

// setEndpoint changes your endpoint.
func setEndpoint(endpoint string) error {
	// endpoint := "eu-vision.googleapis.com:443"

	ctx := context.Background()
	client, err := vision.NewImageAnnotatorClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("NewImageAnnotatorClient: %w", err)
	}
	defer client.Close()

	return nil
}

在试用此示例之前,请按照Vision API 快速入门:使用客户端库中的 Java 设置说明进行操作。如需了解详情,请参阅 Vision API Java 参考文档

ImageAnnotatorSettings settings =
    ImageAnnotatorSettings.newBuilder().setEndpoint("eu-vision.googleapis.com:443").build();

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
ImageAnnotatorClient client = ImageAnnotatorClient.create(settings);

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Node.js 设置说明进行操作。 如需了解详情,请参阅 Vision Node.js API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

async function setEndpoint() {
  // Specifies the location of the api endpoint
  const clientOptions = {apiEndpoint: 'eu-vision.googleapis.com'};

  // Creates a client
  const client = new vision.ImageAnnotatorClient(clientOptions);

  // Performs text detection on the image file
  const [result] = await client.textDetection('./resources/wakeupcat.jpg');
  const labels = result.textAnnotations;
  console.log('Text:');
  labels.forEach(label => console.log(label.description));
}
setEndpoint();

试用此示例之前,请按照《Vision 快速入门:使用客户端库》中的 Python 设置说明进行操作。 如需了解详情,请参阅 Vision Python API 参考文档

如需向 Vision 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import vision

client_options = {"api_endpoint": "eu-vision.googleapis.com"}

client = vision.ImageAnnotatorClient(client_options=client_options)

自行试用

如果您是 Google Cloud 新手,请创建一个账号来评估 Cloud Vision API 在实际场景中的表现。新客户还可获享 $300 赠金,用于运行、测试和部署工作负载。

免费试用 Cloud Vision API