ファイルのアノテーションをオフラインで一括処理する

モバイルデバイスアプリを作成する場合。Firebase 向け ML Kit をお試しください。これは、Cloud Vision サービスを使用するためのプラットフォーム固有の SDK（Android および iOS 用）のほか、デバイス上の ML Vision API、カスタム ML モデルを使用したデバイス上の推論を提供します。
Vision API では、すべての特徴に対してオンライン（同期）リクエストを送信し、少数の PDF / TIFF / GIF ファイルにバッチ処理でアノテーションを追加できます。この同期リクエストは、バッチファイルの検出とアノテーションを実行します。この機能の詳細については、オンラインでの小さなバッチによるファイルアノテーションをご覧ください。
PDF / TIFF ドキュメントテキスト検出の料金は、DOCUMENT_TEXT_DETECTION のレートで計算されます。詳しくは、料金ページをご覧ください。

Vision API は、Cloud Storage に保存されている PDF や TIFF ファイルから、あらゆる Vision API の機能を検出できます。

PDF と TIFF からの機能検出は、files:asyncBatchAnnotate 関数を使用してリクエストする必要があります。これにより、オフライン（非同期）リクエストが実行され、operations リソースでそのステータスを確認できるようになります。

PDF / TIFF リクエストからの出力は、指定した Cloud Storage バケットに作成された JSON ファイルに書き込まれます。

制限事項

Vision API は、2,000 ページまでの PDF/TIFF ファイルを受け入れます。これよりファイルが大きくなるとエラーが返されます。

認証

API キーは、files:asyncBatchAnnotate リクエストではサポートされていません。サービスアカウントによる認証の手順については、サービスアカウントの使用をご覧ください。

認証に使用するアカウントは、出力のために指定する Cloud Storage バケットへのアクセス権（roles/editor または roles/storage.objectCreator 以上の役割）が付与されている必要があります。

API キーを使用してオペレーションのステータスをクエリできます。手順については、API キーの使用をご覧ください。

機能検出リクエスト

現在のところ、PDF や TIFF ドキュメントの検出は Cloud Storage バケットに保存されているファイルに対してのみ実行できます。レスポンスの JSON ファイルも Cloud Storage バケットに保存されます。

コマンドライン

PDF / TIFF ドキュメントテキスト検出を行うには、POST リクエストを作成し、適切なリクエスト本文を指定します。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
https://vision.googleapis.com/v1/files:asyncBatchAnnotate -d "{
  'requests':[
    {
      'inputConfig': {
        'gcsSource': {
          'uri': 'gs://your-source-bucket-name/folder/multi-page-file.pdf'
        },
        'mimeType': 'application/pdf'
      },
      'features': [
        {
          'type': 'DOCUMENT_TEXT_DETECTION'
        }
      ],
      'outputConfig': {
        'gcsDestination': {
          'uri': 'gs://your-bucket-name/folder/'
        },
        'batchSize': 1
      }
    }
  ]
}"

ここで

inputConfig は、他の Vision API リクエストで使用される image フィールドの代わりです。これには、次の 2 つの子フィールドが含まれます。
- gcsSource.uri - PDF または TIFF ファイルの Cloud Storage URI（リクエストするユーザーまたはサービスアカウントがアクセス可能な URI）
- mimeType - 使用可能なファイルタイプ（application/pdf または image/tiff）。
outputConfig は、出力の詳細を指定します。これには、次の 2 つの子フィールドが含まれます。
- gcsDestination.uri - 有効な Cloud Storage URI。バケットは、リクエストを行うユーザーまたはサービスアカウントによって書き込み可能である必要があります。ファイル名は output-x-to-y です。ここで、x と y は出力ファイルに含まれる PDF / TIFF のページ番号です。ファイルが存在する場合、その内容は上書きされます。
- batchSize - それぞれの JSON 出力ファイルに含める出力ページ数を指定します。

レスポンス:

asyncBatchAnnotate リクエストに成功すると、次のような name フィールドのみを含むレスポンスが返されます。

{
  "name": "projects/usable-auth-library/operations/1efec2285bd442df"
}

この name は関連 ID（例: 1efec2285bd442df）を持つ長時間実行オペレーションの名前です。この名前は、v1.operations API を使用してクエリできます。

Vision のアノテーションレスポンスを取得するには、v1.operations エンドポイントに GET リクエストを送信し、URL でオペレーション ID を渡します。

curl -X GET -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://vision.googleapis.com/v1/operations/1efec2285bd442df

オペレーションが進行中の場合:

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "RUNNING",
    "createTime": "2019-05-15T21:10:08.401917049Z",
    "updateTime": "2019-05-15T21:10:33.700763554Z"
  }
}

オペレーションが完了すると、state は DONE として表示され、指定した Cloud Storage ファイルに結果が書き込まれます。

{
  "name": "operations/1efec2285bd442df",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
    "state": "DONE",
    "createTime": "2019-05-15T20:56:30.622473785Z",
    "updateTime": "2019-05-15T20:56:41.666379749Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
    "responses": [
      {
        "outputConfig": {
          "gcsDestination": {
            "uri": "gs://your-bucket-name/folder/"
          },
          "batchSize": 1
        }
      }
    ]
  }
}

出力ファイル内の JSON は、画像のドキュメントテキスト検出リクエストの JSON と似ていますが、指定された PDF または TIFF の場所とファイルのページ数を示す context フィールドが追加されています。

output-1-to-1.json

完全なレスポンス

{
  "inputConfig": {
    "gcsSource": {
      "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf"
    },
    "mimeType": "application/pdf"
  },
  "responses": [
    {
      "fullTextAnnotation": {
        "pages": [
          {
            "property": {
              "detectedLanguages": [
                {
                  "languageCode": "en",
                  "confidence": 0.94
                }
              ]
            },
            "width": 612,
            "height": 792,
            "blocks": [
              {
                "boundingBox": {
                  "normalizedVertices": [
                    {
                      "x": 0.12908497,
                      "y": 0.10479798
                    },
                    ...
                    {
                      "x": 0.12908497,
                      "y": 0.1199495
                    }
                  ]
                },
                "paragraphs": [
                  {
                  ...
                    },
                    "words": [
                      {
                        ...
                        },
                        "symbols": [
                          {
                          ...
                            "text": "C",
                            "confidence": 0.99
                          },
                          {
                            "property": {
                              "detectedLanguages": [
                                {
                                  "languageCode": "en"
                                }
                              ]
                            },
                            "text": "O",
                            "confidence": 0.99
                          },
             ...
             }
            ]
          }
        ],
        "text": "CONTENTS\n.\n1-1\nII-1\nIII-1\nList of Statistical Tables...
        \nHow to Use This Census Report ..\nTable Finding Guide .\nUser
        Notes .......\nStatistical Tables.........\nAppendixes
        \nA Geographic Terms and Concepts .........\nB Definitions of
        Subject Characteristics.\nData Collection and Processing Procedures...
        \nQuestionnaire. ........\nE Maps .................\nF Operational
        Overview and accuracy of the Data.......\nG Residence Rule and
        Residence Situations for the \n2010 Census of the United States...
        \nH Acknowledgments .....\nE\n*Appendix may be found in the separate
        volume, CPH-1-A, Summary Population and\nHousing Characteristics,
        Selected Appendixes, on the Internet at
        <www.census.gov\n/prod/cen2010/cph-1-a.pdf>.\nContents\n"
      },
      "context": {
        "uri": "gs://cloud-samples-data/vision/pdf_tiff/census2010.pdf",
        "pageNumber": 1
      }
    }
  ]
}

Go

このサンプルを試す前に、Vision クイックスタート: クライアントライブラリの使用にある Go の設定手順を完了してください。詳細については、Vision Go API のリファレンスドキュメントをご覧ください。

Vision に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証を設定するをご覧ください。


// detectAsyncDocumentURI performs Optical Character Recognition (OCR) on a
// PDF file stored in GCS.
func detectAsyncDocumentURI(w io.Writer, gcsSourceURI, gcsDestinationURI string) error {
	ctx := context.Background()

	client, err := vision.NewImageAnnotatorClient(ctx)
	if err != nil {
		return err
	}

	request := &visionpb.AsyncBatchAnnotateFilesRequest{
		Requests: []*visionpb.AsyncAnnotateFileRequest{
			{
				Features: []*visionpb.Feature{
					{
						Type: visionpb.Feature_DOCUMENT_TEXT_DETECTION,
					},
				},
				InputConfig: &visionpb.InputConfig{
					GcsSource: &visionpb.GcsSource{Uri: gcsSourceURI},
					// Supported MimeTypes are: "application/pdf" and "image/tiff".
					MimeType: "application/pdf",
				},
				OutputConfig: &visionpb.OutputConfig{
					GcsDestination: &visionpb.GcsDestination{Uri: gcsDestinationURI},
					// How many pages should be grouped into each json output file.
					BatchSize: 2,
				},
			},
		},
	}

	operation, err := client.AsyncBatchAnnotateFiles(ctx, request)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "Waiting for the operation to finish.")

	resp, err := operation.Wait(ctx)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "%v", resp)

	return nil
}

Java

このサンプルを試す前に、Vision クイックスタート: クライアントライブラリの使用にある Java の設定を完了してください。詳細については、Vision Java API のリファレンスドキュメントをご覧ください。

/**
 * Performs document text OCR with PDF/TIFF as source files on Google Cloud Storage.
 *
 * @param gcsSourcePath The path to the remote file on Google Cloud Storage to detect document
 *     text on.
 * @param gcsDestinationPath The path to the remote file on Google Cloud Storage to store the
 *     results on.
 * @throws Exception on errors while closing the client.
 */
public static void detectDocumentsGcs(String gcsSourcePath, String gcsDestinationPath)
    throws Exception {

  // Initialize client that will be used to send requests. This client only needs to be created
  // once, and can be reused for multiple requests. After completing all of your requests, call
  // the "close" method on the client to safely clean up any remaining background resources.
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    List<AsyncAnnotateFileRequest> requests = new ArrayList<>();

    // Set the GCS source path for the remote file.
    GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath).build();

    // Create the configuration with the specified MIME (Multipurpose Internet Mail Extensions)
    // types
    InputConfig inputConfig =
        InputConfig.newBuilder()
            .setMimeType(
                "application/pdf") // Supported MimeTypes: "application/pdf", "image/tiff"
            .setGcsSource(gcsSource)
            .build();

    // Set the GCS destination path for where to save the results.
    GcsDestination gcsDestination =
        GcsDestination.newBuilder().setUri(gcsDestinationPath).build();

    // Create the configuration for the System.output with the batch size.
    // The batch size sets how many pages should be grouped into each json System.output file.
    OutputConfig outputConfig =
        OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination).build();

    // Select the Feature required by the vision API
    Feature feature = Feature.newBuilder().setType(Feature.Type.DOCUMENT_TEXT_DETECTION).build();

    // Build the OCR request
    AsyncAnnotateFileRequest request =
        AsyncAnnotateFileRequest.newBuilder()
            .addFeatures(feature)
            .setInputConfig(inputConfig)
            .setOutputConfig(outputConfig)
            .build();

    requests.add(request);

    // Perform the OCR request
    OperationFuture<AsyncBatchAnnotateFilesResponse, OperationMetadata> response =
        client.asyncBatchAnnotateFilesAsync(requests);

    System.out.println("Waiting for the operation to finish.");

    // Wait for the request to finish. (The result is not used, since the API saves the result to
    // the specified location on GCS.)
    List<AsyncAnnotateFileResponse> result =
        response.get(180, TimeUnit.SECONDS).getResponsesList();

    // Once the request has completed and the System.output has been
    // written to GCS, we can list all the System.output files.
    Storage storage = StorageOptions.getDefaultInstance().getService();

    // Get the destination location from the gcsDestinationPath
    Pattern pattern = Pattern.compile("gs://([^/]+)/(.+)");
    Matcher matcher = pattern.matcher(gcsDestinationPath);

    if (matcher.find()) {
      String bucketName = matcher.group(1);
      String prefix = matcher.group(2);

      // Get the list of objects with the given prefix from the GCS bucket
      Bucket bucket = storage.get(bucketName);
      com.google.api.gax.paging.Page<Blob> pageList = bucket.list(BlobListOption.prefix(prefix));

      Blob firstOutputFile = null;

      // List objects with the given prefix.
      System.out.println("Output files:");
      for (Blob blob : pageList.iterateAll()) {
        System.out.println(blob.getName());

        // Process the first System.output file from GCS.
        // Since we specified batch size = 2, the first response contains
        // the first two pages of the input file.
        if (firstOutputFile == null) {
          firstOutputFile = blob;
        }
      }

      // Get the contents of the file and convert the JSON contents to an AnnotateFileResponse
      // object. If the Blob is small read all its content in one request
      // (Note: the file is a .json file)
      // Storage guide: https://cloud.google.com/storage/docs/downloading-objects
      String jsonContents = new String(firstOutputFile.getContent());
      Builder builder = AnnotateFileResponse.newBuilder();
      JsonFormat.parser().merge(jsonContents, builder);

      // Build the AnnotateFileResponse object
      AnnotateFileResponse annotateFileResponse = builder.build();

      // Parse through the object to get the actual response for the first page of the input file.
      AnnotateImageResponse annotateImageResponse = annotateFileResponse.getResponses(0);

      // Here we print the full text from the first page.
      // The response contains more information:
      // annotation/pages/blocks/paragraphs/words/symbols
      // including confidence score and bounding boxes
      System.out.format("%nText: %s%n", annotateImageResponse.getFullTextAnnotation().getText());
    } else {
      System.out.println("No MATCH");
    }
  }
}

Node.js

このサンプルを試す前に、Vision クイックスタート: クライアントライブラリの使用にある Node.js の設定を完了してください。詳細については、Vision Node.js API のリファレンスドキュメントをご覧ください。


// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Bucket where the file resides
// const bucketName = 'my-bucket';
// Path to PDF file within bucket
// const fileName = 'path/to/document.pdf';
// The folder to store the results
// const outputPrefix = 'results'

const gcsSourceUri = `gs://${bucketName}/${fileName}`;
const gcsDestinationUri = `gs://${bucketName}/${outputPrefix}/`;

const inputConfig = {
  // Supported mime_types are: 'application/pdf' and 'image/tiff'
  mimeType: 'application/pdf',
  gcsSource: {
    uri: gcsSourceUri,
  },
};
const outputConfig = {
  gcsDestination: {
    uri: gcsDestinationUri,
  },
};
const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
const request = {
  requests: [
    {
      inputConfig: inputConfig,
      features: features,
      outputConfig: outputConfig,
    },
  ],
};

const [operation] = await client.asyncBatchAnnotateFiles(request);
const [filesResponse] = await operation.promise();
const destinationUri =
  filesResponse.responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);

Python

このサンプルを試す前に、Vision クイックスタート: クライアントライブラリの使用にある Python の設定を完了してください。詳細については、Vision Python API のリファレンスドキュメントをご覧ください。

def async_detect_document(gcs_source_uri, gcs_destination_uri):
    """OCR with PDF/TIFF as source files on GCS"""
    import json
    import re
    from google.cloud import vision
    from google.cloud import storage

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 2

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size
    )

    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix, filtering out folders.
    blob_list = [
        blob
        for blob in list(bucket.list_blobs(prefix=prefix))
        if not blob.name.endswith("/")
    ]
    print("Output files:")
    for blob in blob_list:
        print(blob.name)

    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[0]

    json_string = output.download_as_bytes().decode("utf-8")
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print("Full text:\n")
    print(annotation["text"])

gcloud

使用する gcloud コマンドは、ファイル形式によって異なります。

PDF テキスト検出を行う場合は、次の例のように gcloud ml vision detect-text-pdf コマンドを実行します。
```
gcloud ml vision detect-text-pdf gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```
TIFF テキスト検出を行うには、次の例のように gcloud ml vision detect-text-tiff コマンドを実行します。
```
gcloud ml vision detect-text-tiff gs://my_bucket/input_file  gs://my_bucket/out_put_prefix
```

その他の言語

C#: クライアントライブラリページの C# の設定手順を行ってから、.NET 用の Vision リファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を行ってから、PHP 用の Vision リファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を行ってから、Ruby 用の Vision リファレンスドキュメントをご覧ください。