Getting online predictions from custom-trained models

AI Platform (Unified) online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response.

Before you begin

In order to request predictions, you must first:

Settings to update in model deployment

During model deployment, you make the following important decisions about how to run online prediction:

Resource created Setting specified at resource creation
Endpoint Location in which to run predictions
Model Container to use (ModelContainerSpec)
DeployedModel Machines to use for online prediction

You can't update the settings listed above after the initial creation of the model or endpoint, and you can't override them in the online prediction request. If you need to change these settings, you must redeploy your model.

Formatting your input for online prediction

If you're using one of our pre-built containers to serve predictions using TensorFlow, scikit-learn, or XGBoost, your prediction input instances need to be formatted as JSON.

If your model uses a custom container, your input must be formatted as JSON, and there is an additional parameters field that can be used for your container. Learn more about formatting prediction input with custom containers.

This section shows how to format your prediction input instances as JSON, and how to handle binary data with base64 encoding.

Formatting instances as JSON strings

The basic format for online prediction is a list of data instances. These can be either plain lists of values or members of a JSON object, depending on how you configured your inputs in your training application. TensorFlow models can accept more complex inputs, while most scikit-learn and XGBoost models expect a list of numbers as input.

This example shows an input tensor and an instance key to a TensorFlow model:

{"values": [1, 2, 3, 4], "key": 1}

The makeup of the JSON string can be complex as long as it follows these rules:

  • The top level of instance data must be a JSON object: a dictionary of key/value pairs.

  • Individual values in an instance object can be strings, numbers, or lists. You cannot embed JSON objects.

  • Lists must contain only items of the same type (including other lists). You may not mix string and numerical values.

You pass input instances for online prediction as the message body for the projects.locations.endpoints.predict call. Learn more about the request body's formatting requirements.

Make each instance an item in a JSON array, and provide the array as the instances field of a JSON object. For example:

{"instances": [
  {"values": [1, 2, 3, 4], "key": 1},
  {"values": [5, 6, 7, 8], "key": 2}
]}

Encoding binary data for prediction input

Binary data cannot be formatted as the UTF-8 encoded strings that JSON supports. If you have binary data in your inputs, you must use base64 encoding to represent it. The following special formatting is required:

  • Your encoded string must be formatted as a JSON object with a single key named b64. In Python 3, base64 encoding outputs a byte sequence. You must convert this to a string to make it JSON serializable:

    {'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
    
  • In your TensorFlow model code, you must name the aliases for your binary input and output tensors so that they end with '_bytes'.

Request and response examples

This section describes the format of the prediction request body and of the response body, with examples for TensorFlow, scikit-learn, and XGBoost.

Request body details

TensorFlow

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <value>|<simple/nested list>|<object>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for.

The structure of each element of the instances list is determined by your model's input definition. Instances can include named inputs (as objects) or can contain only unlabeled values.

Not all data includes named inputs. Some instances are simple JSON values (boolean, number, or string). However, instances are often lists of simple values, or complex nested lists.

Below are some examples of request bodies.

CSV data with each row encoded as a string value:

{"instances": ["1.0,true,\\"x\\"", "-2.0,false,\\"y\\""]}

Plain text:

{"instances": ["the quick brown fox", "the lazy dog"]}

Sentences encoded as lists of words (vectors of strings):

{
  "instances": [
    ["the","quick","brown"],
    ["the","lazy","dog"],
    ...
  ]
}

Floating point scalar values:

{"instances": [0.0, 1.1, 2.2]}

Vectors of integers:

{
  "instances": [
    [0, 1, 2],
    [3, 4, 5],
    ...
  ]
}

Tensors (in this case, two-dimensional tensors):

{
  "instances": [
    [
      [0, 1, 2],
      [3, 4, 5]
    ],
    ...
  ]
}

Images, which can be represented different ways. In this encoding scheme the first two dimensions represent the rows and columns of the image, and the third dimension contains lists (vectors) of the R, G, and B values for each pixel:

{
  "instances": [
    [
      [
        [138, 30, 66],
        [130, 20, 56],
        ...
      ],
      [
        [126, 38, 61],
        [122, 24, 57],
        ...
      ],
      ...
    ],
    ...
  ]
}

Data encoding

JSON strings must be encoded as UTF-8. To send binary data, you must base64-encode the data and mark it as binary. To mark a JSON string as binary, replace it with a JSON object with a single attribute named b64:

{"b64": "..."} 

The following example shows two serialized tf.Examples instances, requiring base64 encoding (fake data, for illustrative purposes only):

{"instances": [{"b64": "X5ad6u"}, {"b64": "IA9j4nx"}]}

The following example shows two JPEG image byte strings, requiring base64 encoding (fake data, for illustrative purposes only):

{"instances": [{"b64": "ASa8asdf"}, {"b64": "JLK7ljk3"}]}

Multiple input tensors

Some models have an underlying TensorFlow graph that accepts multiple input tensors. In this case, use the names of JSON name/value pairs to identify the input tensors.

For a graph with input tensor aliases "tag" (string) and "image" (base64-encoded string):

{
  "instances": [
    {
      "tag": "beach",
      "image": {"b64": "ASa8asdf"}
    },
    {
      "tag": "car",
      "image": {"b64": "JLK7ljk3"}
    }
  ]
}

For a graph with input tensor aliases "tag" (string) and "image" (3-dimensional array of 8-bit ints):

{
  "instances": [
    {
      "tag": "beach",
      "image": [
        [
          [138, 30, 66],
          [130, 20, 56],
          ...
        ],
        [
          [126, 38, 61],
          [122, 24, 57],
          ...
        ],
        ...
      ]
    },
    {
      "tag": "car",
      "image": [
        [
          [255, 0, 102],
          [255, 0, 97],
          ...
        ],
        [
          [254, 1, 101],
          [254, 2, 93],
          ...
        ],
        ...
      ]
    },
    ...
  ]
}

scikit-learn

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <simple list>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for. In the following example, each input instance is a list of floats:

{
  "instances": [
    [0.0, 1.1, 2.2],
    [3.3, 4.4, 5.5],
    ...
  ]
}

The dimension of input instances must match what your model expects. For example, if your model requires three features, then the length of each input instance must be 3.

XGBoost

The request body contains data with the following structure (JSON representation):

{
  "instances": [
    <simple list>,
    ...
  ]
}

The instances[] object is required, and must contain the list of instances to get predictions for. In the following example, each input instance is a list of floats:

{
  "instances": [
    [0.0, 1.1, 2.2],
    [3.3, 4.4, 5.5],
    ...
  ]
}

The dimension of input instances must match what your model expects. For example, if your model requires three features, then the length of each input instance must be 3.

AI Platform does not support sparse representation of input instances for XGBoost.

The online prediction service interprets zeros and NaNs differently. If the value of a feature is zero, use 0.0 in the corresponding input. If the value of a feature is missing, use NaN in the corresponding input.

The following example represents a prediction request with a single input instance, where the value of the first feature is 0.0, the value of the second feature is 1.1, and the value of the third feature is missing:

{"instances": [[0.0, 1.1, NaN]]}

Response body details

Responses are very similar to requests.

If the call is successful, the response body contains one prediction entry per instance in the request body, given in the same order:

{
  "predictions": [
    {
      object
    }
  ],
  "deployedModelId": string
}

If prediction fails for any instance, the response body contains no predictions. Instead, it contains a single error entry:

{
  "error": string
}

The predictions[] object contains the list of predictions, one for each instance in the request.

On error, the error string contains a message describing the problem. The error is returned instead of a prediction list if an error occurred while processing any instance.

Even though there is one prediction per instance, the format of a prediction is not directly related to the format of an instance. Predictions take whatever format is specified in the outputs collection defined in the model. The collection of predictions is returned in a JSON list. Each member of the list can be a simple value, a list, or a JSON object of any complexity. If your model has more than one output tensor, each prediction will be a JSON object containing a name/value pair for each output. The names identify the output aliases in the graph.

Response body examples

TensorFlow

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A more complex set of predictions, each containing two named values that correspond to output tensors, named label and scores respectively. The value of label is the predicted category ("car" or "beach") and scores contains a list of probabilities for that instance across the possible categories.

    {
      "predictions": [
        {
          "label": "beach",
          "scores": [0.1, 0.9]
        },
        {
          "label": "car",
          "scores": [0.75, 0.25]
        }
      ],
      "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

scikit-learn

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

XGBoost

The following examples show some possible responses:

  • A simple set of predictions for three input instances, where each prediction is an integer value:

    {"predictions":
       [5, 4, 3],
       "deployedModelId": 123456789012345678
    }
    
  • A response when there is an error processing an input instance:

    {"error": "Divide by zero"}
    

Sending an online prediction request

Request an online prediction by sending your input data instances as a JSON string in a predict request. For formatting of the request and response body, see the details of the prediction request.

Each prediction request must be 1.5 MB or smaller.

gcloud

The following example uses the gcloud beta ai endpoints predict command:

  1. Write the following JSON object to file in your local environment. The filename does not matter, but for this example name the file request.json.

    {
     "instances": INSTANCES
    }
    

    Replace the following:

    • INSTANCES: A JSON array of instances that you want to get predictions for. The format of each instance depends on what inputs your particular trained ML model expects. See the Formatting your input for online prediction section of this document.

  2. Run the following command:

    gcloud beta ai endpoints predict ENDPOINT_ID \
      --region=LOCATION \
      --json-request=request.json
    

    Replace the following:

    • ENDPOINT_ID: The ID for the endpoint.
    • LOCATION: The region where you are using AI Platform.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • LOCATION: The region where you are using AI Platform.
  • PROJECT: Your project ID or project number
  • ENDPOINT_ID: The ID for the endpoint.
  • INSTANCES: A JSON array of instances that you want to get predictions for. The format of each instance depends on what inputs your particular trained ML model expects. See the Formatting your input for online prediction section of this document.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict

Request JSON body:

{
  "instances": INSTANCES
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content
If successful, you receive a JSON response similar to the following. In the response, expect the following replacements:
  • PREDICTIONS: A JSON array of predictions, one for each instance that you included in the request body.
  • DEPLOYED_MODEL_ID: The ID of the DeployedModel that served these predictions.
{
  "predictions": PREDICTIONS,
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

Java


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictRequest;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.ListValue;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.List;

public class PredictCustomTrainedModelSample {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String instance = "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]";
    String project = "YOUR_PROJECT_ID";
    String endpointId = "YOUR_ENDPOINT_ID";
    predictCustomTrainedModel(project, endpointId, instance);
  }

  static void predictCustomTrainedModel(String project, String endpointId, String instance)
      throws IOException {
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);

      ListValue.Builder listValue = ListValue.newBuilder();
      JsonFormat.parser().merge(instance, listValue);
      List<Value> instanceList = listValue.getValuesList();

      PredictRequest predictRequest =
          PredictRequest.newBuilder()
              .setEndpoint(endpointName.toString())
              .addAllInstances(instanceList)
              .build();
      PredictResponse predictResponse = predictionServiceClient.predict(predictRequest);

      System.out.println("Predict Custom Trained model Response");
      System.out.format("\tDeployed Model Id: %s\n", predictResponse.getDeployedModelId());
      System.out.println("Predictions");
      for (Value prediction : predictResponse.getPredictionsList()) {
        System.out.format("\tPrediction: %s\n", prediction);
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const filename = "YOUR_PREDICTION_FILE_NAME";
// const endpointId = "YOUR_ENDPOINT_ID";
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const util = require('util');
const {readFile} = require('fs');
const readFileAsync = util.promisify(readFile);

// Imports the Google Cloud Prediction Service Client library
const {PredictionServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function predictCustomTrainedModel() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
  const parameters = {
    structValue: {
      fields: {},
    },
  };
  const instanceDict = await readFileAsync(filename, 'utf8');
  const instanceValue = JSON.parse(instanceDict);
  const instance = {
    structValue: {
      fields: {
        Age: {stringValue: instanceValue['Age']},
        Balance: {stringValue: instanceValue['Balance']},
        Campaign: {stringValue: instanceValue['Campaign']},
        Contact: {stringValue: instanceValue['Contact']},
        Day: {stringValue: instanceValue['Day']},
        Default: {stringValue: instanceValue['Default']},
        Deposit: {stringValue: instanceValue['Deposit']},
        Duration: {stringValue: instanceValue['Duration']},
        Housing: {stringValue: instanceValue['Housing']},
        Job: {stringValue: instanceValue['Job']},
        Loan: {stringValue: instanceValue['Loan']},
        MaritalStatus: {stringValue: instanceValue['MaritalStatus']},
        Month: {stringValue: instanceValue['Month']},
        PDays: {stringValue: instanceValue['PDays']},
        POutcome: {stringValue: instanceValue['POutcome']},
        Previous: {stringValue: instanceValue['Previous']},
      },
    },
  };

  const instances = [instance];
  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);

  console.log('Predict custom trained model response');
  console.log(`\tDeployed model id : ${response.deployedModelId}`);
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}
predictCustomTrainedModel();

Python

This example uses the AI Platform (Unified) Client Library for Python. Before you run the following code sample, you must set up authentication.

from typing import Dict

from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value


def predict_custom_trained_model_sample(
    project: str,
    endpoint_id: str,
    instance_dict: Dict,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)
    # The format of each instance should conform to the deployed model's prediction input schema.
    instance = json_format.ParseDict(instance_dict, Value())
    instances = [instance]
    parameters_dict = {}
    parameters = json_format.ParseDict(parameters_dict, Value())
    endpoint = client.endpoint_path(
        project=project, location=location, endpoint=endpoint_id
    )
    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters
    )
    print("response")
    print(" deployed_model_id:", response.deployed_model_id)
    # The predictions are a google.protobuf.Value representation of the model's predictions.
    predictions = response.predictions
    for prediction in predictions:
        print(" prediction:", dict(prediction))

Sending an online explanation request

If you have configured your Model for Explainable AI, then you can get online explanations. Online explanation requests have the same format as online prediction requests, and they return similar responses; the only difference is that online explanation responses include feature attributions as well as predictions.

The following examples are almost identical to the examples from the preceding section, except they use slightly different commands:

gcloud

The following example uses the gcloud beta ai endpoints explain command:

  1. Write the following JSON object to file in your local environment. The filename does not matter, but for this example name the file request.json.

    {
     "instances": INSTANCES
    }
    

    Replace the following:

    • INSTANCES: A JSON array of instances that you want to get predictions for. The format of each instance depends on what inputs your particular trained ML model expects. See the Formatting your input for online prediction section of this document.

  2. Run the following command:

    gcloud beta ai endpoints explain ENDPOINT_ID \
      --region=LOCATION \
      --json-request=request.json
    

    Replace the following:

    • ENDPOINT_ID: The ID for the endpoint.
    • LOCATION: The region where you are using AI Platform.

    Optionally, if you want to send an explanation request to a specific DeployedModel on the Endpoint, you can specify the --deployed-model-id flag:

    gcloud beta ai endpoints explain ENDPOINT_ID \
      --region=LOCATION \
      --deployed-model-id=DEPLOYED_MODEL_ID \
      --json-request=request.json
    

    In addition to the placeholders described previously, replace the following:

    • DEPLOYED_MODEL_ID (optional): The ID of the deployed model for which you want to get explanations. The ID is included in the predict method's response. If you need to request explanations for a particular model and you have more than one model deployed to the same endpoint, you can use this ID to ensure that the explanations are returned for that particular model.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • LOCATION: The region where you are using AI Platform.
  • PROJECT: Your project ID or project number
  • ENDPOINT_ID: The ID for the endpoint.
  • INSTANCES: A JSON array of instances that you want to get predictions for. The format of each instance depends on what inputs your particular trained ML model expects. See the Formatting your input for online prediction section of this document.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain

Request JSON body:

{
  "instances": INSTANCES
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/endpoints/ENDPOINT_ID:explain" | Select-Object -Expand Content
If successful, you receive a JSON response similar to the following. In the response, expect the following replacements:
  • PREDICTIONS: A JSON array of predictions, one for each instance that you included in the request body.
  • EXPLANATIONS: A JSON array of explanations, one for each prediction.
  • DEPLOYED_MODEL_ID: The ID of the DeployedModel that served these predictions.
{
  "predictions": PREDICTIONS,
  "explanations": EXPLANATIONS,
  "deployedModelId": "DEPLOYED_MODEL_ID"
}

What's next