进行批量预测

创建(训练)模型后,您可以使用 batchPredict 方法为一批图片创建异步预测请求。batchPredict 方法基于模型预测的图片的主对象将标签应用于图片。

自正式版发布之日起,自定义模型的最长使用期限为 18 个月。该时间过后,您必须创建并训练新的模型,才能继续为内容添加注释。

批量预测

您可以使用 batchPredict 命令请求对图片进行注释(预测)。batchPredict 命令将存储在您的 Google Cloud Storage 存储分区的 CSV 文件作为输入,该文件包含要添加注释的图片的路径。每行都指定一个指向 Google Cloud Storage 中的一张图片的单独路径。

batch_prediction.csv

gs://my-cloud-storage-bucket/prediction_files/image1.jpg
gs://my-cloud-storage-bucket/prediction_files/image2.jpg
gs://my-cloud-storage-bucket/prediction_files/image3.jpg
gs://my-cloud-storage-bucket/prediction_files/image4.jpg
gs://my-cloud-storage-bucket/prediction_files/image5.jpg
gs://my-cloud-storage-bucket/prediction_files/image6.png

批量预测任务可能需要一些时间才能完成,具体取决于您在 CSV 文件中指定的图片数量。即使是预测少量图片,批量预测也需要至少 30 分钟才能完成。

REST 和命令行

在使用下面的任何请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID。
  • location-id:有效的位置标识符。目前,您必须使用以下值:
    • us-central1
  • model-id:您的模型的 ID(从创建模型时返回的响应中获取)。此 ID 是模型名称的最后一个元素。 例如:
    • 模型名称:projects/project-id/locations/location-id/models/IOD4412217016962778756
    • 模型 ID:IOD4412217016962778756
  • input-storage-path:存储在 Google Cloud Storage 中的 CSV 文件的路径。 发出请求的用户必须至少具有相应存储分区的读取权限。
  • output-storage-bucket:用于保存输出文件的 Google Cloud Storage 存储分区/目录,采用以下格式表示:gs://bucket/directory/。 发出请求的用户必须具有相应存储分区的写入权限。

特定于字段的注意事项

  • params.score_threshold - 一个介于 0.0 至 1.0 之间的值。系统将仅返回得分大于或等于此值的结果。

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict

请求 JSON 正文:

{
  "inputConfig": {
    "gcsSource": {
       "inputUris": [ "input-storage-path" ]
    }
  },
  "outputConfig": {
    "gcsDestination": {
      "outputUriPrefix": "output-storage-bucket"
    }
  },
  "params": {
    "score_threshold": "0.0"
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict" | Select-Object -Expand Content
响应

您将看到如下所示的输出:

{
  "name": "projects/project-id/locations/location-id/operations/ICN926615623331479552",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-06-19T21:28:35.302067Z",
    "updateTime": "2019-06-19T21:28:35.302067Z",
    "batchPredictDetails": {
      "inputConfig": {
        "gcsSource": {
          "inputUris": [
            "input-storage-path"
          ]
        }
      }
    }
  }
}

可以使用操作 ID(本例中为 ICN926615623331479552)来获取任务的状态。如需查看示例,请参阅处理长时间运行的操作

批量预测任务可能需要一些时间才能完成,具体取决于您在 CSV 文件中指定的图片数量。即使是预测少量图片,批量预测也需要至少 30 分钟才能完成。

操作完成后,state 会显示为 DONE,并且结果会写入您指定的 Google Cloud Storage 文件中:

{
  "name": "projects/project-id/locations/location-id/operations/ICN926615623331479552",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-06-19T21:28:35.302067Z",
    "updateTime": "2019-06-19T21:57:18.310033Z",
    "batchPredictDetails": {
      "inputConfig": {
        "gcsSource": {
          "inputUris": [
            "input-storage-path"
          ]
        }
      },
      "outputInfo": {
        "gcsOutputDirectory": "gs://storage-bucket-vcm/subdirectory/prediction-8370559933346329705-YYYY-MM-DDThh:mm:ss.sssZ"
      }
    }
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.BatchPredictResult"
  }
}

如需查看示例输出文件,请参阅下面的输出 JSONL 文件部分。

Java

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.BatchPredictInputConfig;
import com.google.cloud.automl.v1.BatchPredictOutputConfig;
import com.google.cloud.automl.v1.BatchPredictRequest;
import com.google.cloud.automl.v1.BatchPredictResult;
import com.google.cloud.automl.v1.GcsDestination;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.ModelName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.PredictionServiceClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class BatchPredict {

  static void batchPredict() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String modelId = "YOUR_MODEL_ID";
    String inputUri = "gs://YOUR_BUCKET_ID/path_to_your_input_csv_or_jsonl";
    String outputUri = "gs://YOUR_BUCKET_ID/path_to_save_results/";
    batchPredict(projectId, modelId, inputUri, outputUri);
  }

  static void batchPredict(String projectId, String modelId, String inputUri, String outputUri)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient client = PredictionServiceClient.create()) {
      // Get the full path of the model.
      ModelName name = ModelName.of(projectId, "us-central1", modelId);
      GcsSource gcsSource = GcsSource.newBuilder().addInputUris(inputUri).build();
      BatchPredictInputConfig inputConfig =
          BatchPredictInputConfig.newBuilder().setGcsSource(gcsSource).build();
      GcsDestination gcsDestination =
          GcsDestination.newBuilder().setOutputUriPrefix(outputUri).build();
      BatchPredictOutputConfig outputConfig =
          BatchPredictOutputConfig.newBuilder().setGcsDestination(gcsDestination).build();
      BatchPredictRequest request =
          BatchPredictRequest.newBuilder()
              .setName(name.toString())
              .setInputConfig(inputConfig)
              .setOutputConfig(outputConfig)
              .build();

      OperationFuture<BatchPredictResult, OperationMetadata> future =
          client.batchPredictAsync(request);

      System.out.println("Waiting for operation to complete...");
      BatchPredictResult response = future.get();
      System.out.println("Batch Prediction results saved to specified Cloud Storage bucket.");
    }
  }
}

Node.js

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const modelId = 'YOUR_MODEL_ID';
// const inputUri = 'gs://YOUR_BUCKET_ID/path_to_your_input_csv_or_jsonl';
// const outputUri = 'gs://YOUR_BUCKET_ID/path_to_save_results/';

// Imports the Google Cloud AutoML library
const {PredictionServiceClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new PredictionServiceClient();

async function batchPredict() {
  // Construct request
  const request = {
    name: client.modelPath(projectId, location, modelId),
    inputConfig: {
      gcsSource: {
        inputUris: [inputUri],
      },
    },
    outputConfig: {
      gcsDestination: {
        outputUriPrefix: outputUri,
      },
    },
  };

  const [operation] = await client.batchPredict(request);

  console.log('Waiting for operation to complete...');
  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(
    `Batch Prediction results saved to Cloud Storage bucket. ${response}`
  );
}

batchPredict();

Python

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# model_id = "YOUR_MODEL_ID"
# input_uri = "gs://YOUR_BUCKET_ID/path/to/your/input/csv_or_jsonl"
# output_uri = "gs://YOUR_BUCKET_ID/path/to/save/results/"

prediction_client = automl.PredictionServiceClient()

# Get the full path of the model.
model_full_id = f"projects/{project_id}/locations/us-central1/models/{model_id}"

gcs_source = automl.GcsSource(input_uris=[input_uri])

input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(gcs_destination=gcs_destination)

response = prediction_client.batch_predict(
    name=model_full_id, input_config=input_config, output_config=output_config
)

print("Waiting for operation to complete...")
print(
    f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)

其他语言

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Vision 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Vision 参考文档。

Ruby 版: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Vision 参考文档。

输出 JSONL 文件

批量预测任务完成后,预测输出结果将存储在您在命令中指定的 Google Cloud Storage 存储分区中。

在您的输出存储分区中(如适用,在您指定的目录中),将创建 image_classification_1.jsonlimage_classification_2.jsonl、…、image_classification_N.jsonl 文件,其中 N 可以是 1,具体取决于成功预测的图片和注释的总数。

单个图片及其所有注释将仅列出一次,并且其注释绝不会被拆分到多个文件中。

在每个 JSONL 文件中,每行都将包含一个 proto 的 JSON 表示法,其中封装了图片的 "ID" : "<id_value>",并后跟零个或多个已填充分类详细信息的 AnnotationPayload proto(称为 annotations)。

JSONL 文件示例

image_image_classification_0.jsonl-单个.jsonl 文件包含 4 行内容,每行对应一个图片文件注释 JSON。

处理长时间运行的操作

REST 和命令行

在使用下面的任何请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID。
  • operation-id:您的操作的 ID。此 ID 是操作名称的最后一个元素。例如:
    • 操作名称:projects/project-id/locations/location-id/operations/IOD5281059901324392598
    • 操作 ID:IOD5281059901324392598

HTTP 方法和网址:

GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

PowerShell

执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id" | Select-Object -Expand Content
完成导入操作后,您应该会看到类似如下所示的输出:
{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T16:10:41.326614Z",
    "importDataDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

完成创建模型操作后,您应会看到如下输出:

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-07-22T18:35:06.881193Z",
    "updateTime": "2019-07-22T19:58:44.972235Z",
    "createModelDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Model",
    "name": "projects/project-id/locations/us-central1/models/model-id"
  }
}

Go

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// getOperationStatus gets an operation's status.
func getOperationStatus(w io.Writer, projectID string, location string, datasetID string, modelName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "ICN123456789..."
	// modelName := "model_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateModelRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Model: &automlpb.Model{
			DisplayName: modelName,
			DatasetId:   datasetID,
			ModelMetadata: &automlpb.Model_ImageClassificationModelMetadata{
				ImageClassificationModelMetadata: &automlpb.ImageClassificationModelMetadata{
					TrainBudgetMilliNodeHours: 1000, // 1000 milli-node hours are 1 hour
				},
			},
		},
	}

	op, err := client.CreateModel(ctx, req)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "Name: %v\n", op.Name())

	// Wait for the longrunning operation complete.
	resp, err := op.Wait(ctx)
	if err != nil && !op.Done() {
		fmt.Println("failed to fetch operation status", err)
		return err
	}
	if err != nil && op.Done() {
		fmt.Println("operation completed with error", err)
		return err
	}
	fmt.Fprintf(w, "Response: %v\n", resp)

	return nil
}

Java

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.longrunning.Operation;
import java.io.IOException;

class GetOperationStatus {

  static void getOperationStatus() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String operationFullId = "projects/[projectId]/locations/us-central1/operations/[operationId]";
    getOperationStatus(operationFullId);
  }

  // Get the status of an operation
  static void getOperationStatus(String operationFullId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the latest state of a long-running operation.
      Operation operation = client.getOperationsClient().getOperation(operationFullId);

      // Display operation details.
      System.out.println("Operation details:");
      System.out.format("\tName: %s\n", operation.getName());
      System.out.format("\tMetadata Type Url: %s\n", operation.getMetadata().getTypeUrl());
      System.out.format("\tDone: %s\n", operation.getDone());
      if (operation.hasResponse()) {
        System.out.format("\tResponse Type Url: %s\n", operation.getResponse().getTypeUrl());
      }
      if (operation.hasError()) {
        System.out.println("\tResponse:");
        System.out.format("\t\tError code: %s\n", operation.getError().getCode());
        System.out.format("\t\tError message: %s\n", operation.getError().getMessage());
      }
    }
  }
}

Node.js

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const operationId = 'YOUR_OPERATION_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function getOperationStatus() {
  // Construct request
  const request = {
    name: `projects/${projectId}/locations/${location}/operations/${operationId}`,
  };

  const [response] = await client.operationsClient.getOperation(request);

  console.log(`Name: ${response.name}`);
  console.log('Operation details:');
  console.log(`${response}`);
}

getOperationStatus();

Python

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# operation_full_id = \
#     "projects/[projectId]/locations/us-central1/operations/[operationId]"

client = automl.AutoMlClient()
# Get the latest state of a long-running operation.
response = client._transport.operations_client.get_operation(operation_full_id)

print("Name: {}".format(response.name))
print("Operation details:")
print(response)

其他语言

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Vision 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Vision 参考文档。

Ruby 版: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Vision 参考文档。