通过自定义训练模型进行在线预测

Vertex AI 在线预测是一项经过优化的服务,可通过托管模型运行您的数据,并且将延迟控制在最低限度。您只需将小批量数据发送到服务,之后服务将做出响应并返回预测。

准备工作

要请求预测,您首先必须完成以下操作:

配置模型部署

在模型部署期间,您要针对如何运行在线预测做出以下重要决策:

已创建的资源 在创建资源时指定的设置
端点 运行预测的位置
模型 要使用的容器 (ModelContainerSpec)
DeployedModel 用于在线预测的机器

在初始创建模型或端点后,您将无法更新上面列出的设置,也无法在在线预测请求中替换这些设置。如果需要更改这些设置,您必须重新部署模型。

格式化输入以进行在线预测

如果您通过我们的预构建容器使用 TensorFlow、scikit-learn 或 XGBoost 执行预测,则预测输入实例需要采用 JSON 格式。

如果您的模型使用自定义容器,则输入必须采用 JSON 格式,并且还有一个可用于容器的额外 parameters 字段。详细了解如何使用自定义容器设置预测输入的格式

本部分介绍如何将预测输入实例的格式设置为 JSON,以及如何使用 base64 编码处理二进制数据。

将实例格式设置为 JSON 字符串

在线预测的基本格式是数据实例列表。这些列表可以是普通的值列表,也可以是 JSON 对象的成员,具体取决于您在训练应用中配置输入的方式。TensorFlow 模型可以接受更复杂的输入,大多数 scikit-learn 和 XGBoost 模型则接受数字列表形式的输入。

以下示例显示了 TensorFlow 模型的输入张量和实例键:

{"values": [1, 2, 3, 4], "key": 1}

只要遵循以下规则,JSON 字符串的构成可能会很复杂:

  • 顶级实例数据必须是 JSON 对象,即键值对的字典。

  • 实例对象中的各个值可以是字符串、数字或列表。 您无法嵌入 JSON 对象。

  • 列表必须仅包含相同类型的内容(包括其他列表)。您不能混合使用字符串和数值。

您将在线预测的输入实例作为 projects.locations.endpoints.predict 调用的消息正文进行传递。详细了解请求正文的格式要求

使每个实例成为 JSON 数组中的一项,并将该数组作为 JSON 对象的 instances 字段提供。例如:

{"instances": [
  {"values": [1, 2, 3, 4], "key": 1},
  {"values": [5, 6, 7, 8], "key": 2}
]}

对二进制数据进行编码以进行预测预测

二进制数据不能格式化为 JSON 支持的 UTF-8 编码字符串。如果输入中包含二进制数据,则必须使用 base64 编码来表示它。此时需要用到以下特殊格式:

  • 编码的字符串必须设置为 JSON 对象格式,并包含名为 b64 的单个键。在 Python 3 中,base64 编码会输出一个字节序列。您必须将此字节序列转换为字符串,使其能够进行 JSON 序列化:

    {'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
    
  • 在 TensorFlow 模型代码中,您必须为二进制输入和输出张量提供别名,以便它们以“_bytes”结尾。

请求和响应示例

This section describes the format of the prediction request body and of the response body, with examples for TensorFlow, scikit-learn, and XGBoost.

Request body details

TensorFlow

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <value>|<simple/nested list>|<object>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for.

The structure of each element of the instances list is determined by your model's input definition. Instances can include named inputs (as objects) or can contain only unlabeled values.

Not all data includes named inputs. Some instances are simple JSON values (boolean, number, or string). However, instances are often lists of simple values, or complex nested lists.

Below are some examples of request bodies.

CSV data with each row encoded as a string value:

{"instances": ["1.0,true,\\"x\\"", "-2.0,false,\\"y\\""]}

Plain text:

{"instances": ["the quick brown fox", "the lazy dog"]}

Sentences encoded as lists of words (vectors of strings):

{
  "instances": [
    ["the","quick","brown"],
    ["the","lazy","dog"],
    ...
  ]
}

Floating point scalar values:

{"instances": [0.0, 1.1, 2.2]}

Vectors of integers:

{
  "instances": [
    [0, 1, 2],
    [3, 4, 5],
    ...
  ]
}

Tensors (in this case, two-dimensional tensors):

{
  "instances": [
    [
      [0, 1, 2],
      [3, 4, 5]
    ],
    ...
  ]
}

Images, which can be represented different ways. In this encoding scheme the first two dimensions represent the rows and columns of the image, and the third dimension contains lists (vectors) of the R, G, and B values for each pixel:

{
  "instances": [
    [
      [
        [138, 30, 66],
        [130, 20, 56],
        ...
      ],
      [
        [126, 38, 61],
        [122, 24, 57],
        ...
      ],
      ...
    ],
    ...
  ]
}

Data encoding

JSON strings must be encoded as UTF-8. To send binary data, you must base64-encode the data and mark it as binary. To mark a JSON string as binary, replace it with a JSON object with a single attribute named b64:

{"b64": "..."} 

The following example shows two serialized tf.Examples instances, requiring base64 encoding (fake data, for illustrative purposes only):

{"instances": [{"b64": "X5ad6u"}, {"b64": "IA9j4nx"}]}

The following example shows two JPEG image byte strings, requiring base64 encoding (fake data, for illustrative purposes only):

{"instances": [{"b64": "ASa8asdf"}, {"b64": "JLK7ljk3"}]}

Multiple input tensors

Some models have an underlying TensorFlow graph that accepts multiple input tensors. In this case, use the names of JSON name/value pairs to identify the input tensors.

For a graph with input tensor aliases "tag" (string) and "image" (base64-encoded string):

{
  "instances": [
    {
      "tag": "beach",
      "image": {"b64": "ASa8asdf"}
    },
    {
      "tag": "car",
      "image": {"b64": "JLK7ljk3"}
    }
  ]
}

For a graph with input tensor aliases "tag" (string) and "image" (3-dimensional array of 8-bit ints):

{
  "instances": [
    {
      "tag": "beach",
      "image": [
        [
          [138, 30, 66],
          [130, 20, 56],
          ...
        ],
        [
          [126, 38, 61],
          [122, 24, 57],
          ...
        ],
        ...
      ]
    },
    {
      "tag": "car",
      "image": [
        [
          [255, 0, 102],
          [255, 0, 97],
          ...
        ],
        [
          [254, 1, 101],
          [254, 2, 93],
          ...
        ],
        ...
      ]
    },
    ...
  ]
}

scikit-learn

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <simple list>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for. In the following example, each input instance is a list of floats:

{
  "instances": [
    [0.0, 1.1, 2.2],
    [3.3, 4.4, 5.5],
    ...
  ]
}

The dimension of input instances must match what your model expects. For example, if your model requires three features, then the length of each input instance must be 3.

XGBoost

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <simple list>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for. In the following example, each input instance is a list of floats:

{
  "instances": [
    [0.0, 1.1, 2.2],
    [3.3, 4.4, 5.5],
    ...
  ]
}

The dimension of input instances must match what your model expects. For example, if your model requires three features, then the length of each input instance must be 3.

Vertex AI does not support sparse representation of input instances for XGBoost.

The online prediction service interprets zeros and NaNs differently. If the value of a feature is zero, use 0.0 in the corresponding input. If the value of a feature is missing, use NaN in the corresponding input.

The following example represents a prediction request with a single input instance, where the value of the first feature is 0.0, the value of the second feature is 1.1, and the value of the third feature is missing:

{"instances": [[0.0, 1.1, NaN]]}

Response body details

Responses are very similar to requests.

If the call is successful, the response body contains one prediction entry per instance in the request body, given in the same order:

{
  "predictions": [
    {
      object
    }
  ],
  "deployedModelId": string
}

If prediction fails for any instance, the response body contains no predictions. Instead, it contains a single error entry:

{
  "error": string
}

The predictions[] object contains the list of predictions, one for each instance in the request.

On error, the error string contains a message describing the problem. The error is returned instead of a prediction list if an error occurred while processing any instance.

Even though there is one prediction per instance, the format of a prediction is not directly related to the format of an instance. Predictions take whatever format is specified in the outputs collection defined in the model. The collection of predictions is returned in a JSON list. Each member of the list can be a simple value, a list, or a JSON object of any complexity. If your model has more than one output tensor, each prediction will be a JSON object containing a name/value pair for each output. The names identify the output aliases in the graph.

Response body examples

TensorFlow

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A more complex set of predictions, each containing two named values that correspond to output tensors, named label and scores respectively. The value of label is the predicted category ("car" or "beach") and scores contains a list of probabilities for that instance across the possible categories.

    {
      "predictions": [
        {
          "label": "beach",
          "scores": [0.1, 0.9]
        },
        {
          "label": "car",
          "scores": [0.75, 0.25]
        }
      ],
      "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

scikit-learn

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

XGBoost

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

发送在线预测请求

可通过在预测请求中将输入数据实例作为 JSON 字符串发送来请求在线预测。如需了解请求和响应正文的格式,请参阅预测请求的详细信息

每个预测请求不得超过 1.5 MB。

gcloud

以下示例使用 gcloud beta ai endpoints predict 命令

  1. 将以下 JSON 对象写入本地环境中的文件。文件名无关紧要,但在本示例中,请将文件命名为 request.json

    {
     "instances": INSTANCES
    }
    

    请替换以下内容:

    • INSTANCES:要为其获取预测的实例的 JSON 数组。每个实例的格式取决于特定的经过训练的机器学习模型接受的输入。请参阅本文档的设置输入格式以进行在线预测部分。

  2. 运行以下命令:

    gcloud ai endpoints predict ENDPOINT_ID \
      --region=LOCATION \
      --json-request=request.json
    

    请替换以下内容:

    • ENDPOINT_ID:端点的 ID。
    • LOCATION:您在其中使用 Vertex AI 的区域。

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION:您在其中使用 Vertex AI 的区域。
  • PROJECT:您的项目 ID 或项目编号
  • ENDPOINT_ID:端点的 ID。
  • INSTANCES:要为其获取预测的实例的 JSON 数组。每个实例的格式取决于特定的经过训练的机器学习模型接受的输入。请参阅本文档的设置输入格式以进行在线预测部分。

HTTP 方法和网址:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict

请求 JSON 正文:

{
  "instances": INSTANCES
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content
如果成功,您会收到如下所示的 JSON 响应。在响应中,您会看到以下替换:
  • PREDICTIONS:预测结果的 JSON 数组,其中每个预测结果对应请求正文中包含的一个实例。
  • DEPLOYED_MODEL_ID:执行这些预测的 DeployedModel 的 ID。
{
  "predictions": PREDICTIONS,
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Java


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictRequest;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictCustomTrainedModelSample {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String project = "YOUR_PROJECT_ID";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictCustomTrainedModel(project, endpointId, instance);
  }

  static void predictCustomTrainedModel(String project, String endpointId, String instance)
      throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);

      ListValue.Builder listValue = ListValue.newBuilder();
      JsonFormat.parser().merge(instance, listValue);
      List<Value> instanceList = listValue.getValuesList();

      PredictRequest predictRequest =
          PredictRequest.newBuilder()
              .setEndpoint(endpointName.toString())
              .addAllInstances(instanceList)
              .build();
      PredictResponse predictResponse = predictionServiceClient.predict(predictRequest);

      System.out.println("Predict Custom Trained model Response");
      System.out.format("\tDeployed Model Id: %s\n", predictResponse.getDeployedModelId());
      System.out.println("Predictions");
      for (Value prediction : predictResponse.getPredictionsList()) {
        System.out.format("\tPrediction: %s\n", prediction);
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const filename = "YOUR_PREDICTION_FILE_NAME";
// const endpointId = "YOUR_ENDPOINT_ID";
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const util = require('util');
const {readFile} = require('fs');
const readFileAsync = util.promisify(readFile);

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictCustomTrainedModel() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = {
    structValue: {
      fields: {},
    },
  };
  const instanceDict = await readFileAsync(filename, 'utf8');
  const instanceValue = JSON.parse(instanceDict);
  const instance = {
    structValue: {
      fields: {
        Age: {stringValue: instanceValue['Age']},
        Balance: {stringValue: instanceValue['Balance']},
        Campaign: {stringValue: instanceValue['Campaign']},
        Contact: {stringValue: instanceValue['Contact']},
        Day: {stringValue: instanceValue['Day']},
        Default: {stringValue: instanceValue['Default']},
        Deposit: {stringValue: instanceValue['Deposit']},
        Duration: {stringValue: instanceValue['Duration']},
        Housing: {stringValue: instanceValue['Housing']},
        Job: {stringValue: instanceValue['Job']},
        Loan: {stringValue: instanceValue['Loan']},
        MaritalStatus: {stringValue: instanceValue['MaritalStatus']},
        Month: {stringValue: instanceValue['Month']},
        PDays: {stringValue: instanceValue['PDays']},
        POutcome: {stringValue: instanceValue['POutcome']},
        Previous: {stringValue: instanceValue['Previous']},
      },
    },
  };

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict custom trained model response');
  console.log(`\tDeployed model id : ${response.deployedModelId}`);
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}
predictCustomTrainedModel();

Python

本示例使用的是 Python 版 Vertex SDK。在运行以下代码示例之前,必须先设置身份验证。

def endpoint_predict_sample(
    project: str, location: str, instances: list, endpoint: str
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint)

    prediction = endpoint.predict(instances=instances)
    print(prediction)
    return prediction

发送在线说明请求

如果您已针对 Vertex Explainable AI 配置 Model,则可以获取在线说明。在线说明请求的格式与在线预测请求的格式相同,它们返回类似的响应,唯一的区别在于在线说明响应包含特征归因和预测结果。

以下示例与上一部分中的示例几乎完全相同,只是使用略微不同的命令:

gcloud

以下示例使用 gcloud ai endpoints explain 命令

  1. 将以下 JSON 对象写入本地环境中的文件。文件名无关紧要,但在本示例中,请将文件命名为 request.json

    {
     "instances": INSTANCES
    }
    

    请替换以下内容:

    • INSTANCES:要为其获取预测的实例的 JSON 数组。每个实例的格式取决于特定的经过训练的机器学习模型接受的输入。请参阅本文档的设置输入格式以进行在线预测部分。

  2. 运行以下命令:

    gcloud ai endpoints explain ENDPOINT_ID \
      --region=LOCATION \
      --json-request=request.json
    

    请替换以下内容:

    • ENDPOINT_ID:端点的 ID。
    • LOCATION:您在其中使用 Vertex AI 的区域。

    (可选)如果您想要向 Endpoint 中的特定 DeployedModel 发送说明请求,则可以指定 --deployed-model-id 标志:

    gcloud beta ai endpoints explain ENDPOINT_ID \
      --region=LOCATION \
      --deployed-model-id=DEPLOYED_MODEL_ID \
      --json-request=request.json
    

    除了上述占位符之外,还替换以下内容:

    • DEPLOYED_MODEL_ID(可选):您想要为其获取说明的已部署模型的 ID。此 ID 包含在 predict 方法的响应中。如果您需要为特定模型请求说明,并且您在同一个端点上部署了多个模型,则可以使用此 ID 来确保为该特定模型返回说明。

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • LOCATION:您在其中使用 Vertex AI 的区域。
  • PROJECT:您的项目 ID 或项目编号
  • ENDPOINT_ID:端点的 ID。
  • INSTANCES:要为其获取预测的实例的 JSON 数组。每个实例的格式取决于特定的经过训练的机器学习模型接受的输入。请参阅本文档的设置输入格式以进行在线预测部分。

HTTP 方法和网址:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain

请求 JSON 正文:

{
  "instances": INSTANCES
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain"

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain" | Select-Object -Expand Content
如果成功,您会收到如下所示的 JSON 响应。在响应中,您会看到以下替换:
  • PREDICTIONS:预测结果的 JSON 数组,其中每个预测结果对应请求正文中包含的一个实例。
  • EXPLANATIONS说明的 JSON 数组,其中每个说明对应一个预测结果。
  • DEPLOYED_MODEL_ID:执行这些预测的 DeployedModel 的 ID。
{
  "predictions": PREDICTIONS,
  "explanations": EXPLANATIONS,
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Python

def explain_tabular_sample(
    project: str, location: str, endpoint_id: str, instance_dict: Dict
):

    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_id)

    response = endpoint.explain(instances=[instance_dict], parameters={})

    for explanation in response.explanations:
        print(" explanation")
        # Feature attributions.
        attributions = explanation.attributions
        for attribution in attributions:
            print("  attribution")
            print("   baseline_output_value:", attribution.baseline_output_value)
            print("   instance_output_value:", attribution.instance_output_value)
            print("   output_display_name:", attribution.output_display_name)
            print("   approximation_error:", attribution.approximation_error)
            print("   output_name:", attribution.output_name)
            output_index = attribution.output_index
            for output_index in output_index:
                print("   output_index:", output_index)

    for prediction in response.predictions:
        print(prediction)

后续步骤