Getting Online Predictions

AI Platform online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response.

Learn about online versus batch prediction or read an overview of prediction concepts.

Before you begin

In order to request predictions, you must first:

Regions

AI Platform online prediction is currently available in the following regions:

  • us-central1
  • us-east1
  • us-east4
  • asia-northeast1
  • europe-west1

To fully understand the available regions for AI Platform training and prediction services, read the guide to regions.

Creating models and versions

You make the following important decisions about how to run online prediction when creating the model and version resources:

Resource created Decision specified at resource creation
Model Region in which to run predictions
Model Enable online prediction logging
Version Runtime version to use
Version Python version to use
Version Machine type to use for online prediction

You can't update the settings listed above after the initial creation of the model or version. If you need to change these settings, create a new model or version resource with the new settings and redeploy your model.

Machine types available for online prediction

When you create a version, you can choose what type of virtual machine AI Platform Prediction uses for online prediction nodes. Learn more about machine types.

Requesting logs for online prediction requests

The AI Platform prediction service does not provide logged information about requests by default, because the logs incur cost. Online prediction at a high rate of queries per second (QPS) can produce a substantial number of logs, which are subject to Stackdriver pricing or BigQuery pricing.

If you want to enable online prediction logging, you must configure it when you create a model resource or when you create a model version resource, depending on which type of logging you want to enable. There are three types of logging, which you can enable independently:

  • Access logging, which logs information like timestamp and latency for each request to Stackdriver Logging.

    You can enable access logging when you create a model resource.

  • Stream logging, which logs the stderr and stdout streams from your prediction nodes to Stackdriver Logging, and can be useful for debugging. This type of logging is in beta.

    You can enable stream logging when you create a model resource.

  • Request-response logging, which logs a sample of online prediction requests and responses to a BigQuery table. This type of logging is in beta.

    You can enable request-response logging when you create a model version resource.

gcloud

To enable access logging, include the --enable-logging flag when you create your model with the gcloud ai-platform models create command. For example:

gcloud ai-platform models create model_name \
  --regions us-central1 \
  --enable-logging

To enable stream logging (beta), use the gcloud beta component and include the --enable-console-logging flag. For example:

gcloud components install beta

gcloud beta ai-platform models create model_name \
  --regions us-central1 \
  --enable-console-logging

You cannot currently enable request-response logging (beta) by using the gcloud tool. You can only enable this type of logging when you send a projects.models.versions.create request to the REST API.

REST API

To enable access logging, set onlinePredictionLogging to True in the Model resource when creating your model with projects.models.create.

To enable stream logging (beta), set the onlinePredictionConsoleLogging field to True in the Model resource.

Request-response logging

Unlike the other types of logging, you can't enable request-response logging when you create a model. Instead, you can enable it when you you create a version (projects.models.versions.create).

To enable request-response logging, populate the requestLoggingConfig field of the Version resource with the following entries:

  • samplingPercentage: a number between 0 or 1 defining the fraction of requests to log. For example, set this value to 1 in order to log all requests or to 0.1 to log 10% of requests.
  • bigqueryTableName: the fully qualified name (project_id.dataset_name.table_name) of the BigQuery table where you want to log requests and responses. The table must already exist with the following schema:

    Field nameTypeMode
    modelSTRINGREQUIRED
    model_versionSTRINGREQUIRED
    timeTIMESTAMPREQUIRED
    raw_dataSTRINGREQUIRED
    raw_predictionSTRINGNULLABLE
    groundtruthSTRINGNULLABLE

    Learn how to create a BigQuery table.

Inspect models with the What-If Tool

You can use the What-If Tool(WIT) within notebook environments to inspect AI Platform models through an interactive dashboard. The What-If Tool integrates with TensorBoard, Jupyter notebooks, Colab notebooks, and JupyterHub. It is also pre-installed on AI Platform Notebooks TensorFlow instances.

Learn how to use the What-If Tool with AI Platform.

Formatting your input for online prediction

Formatting instances as JSON strings

The basic format for online prediction is a list of data instances. These can be either plain lists of values or members of a JSON object, depending on how you configured your inputs in your training application. TensorFlow models and custom prediction routines can accept more complex inputs, while most scikit-learn and XGBoost models expect a list of numbers as input.

This example shows an input tensor and an instance key to a TensorFlow model:

{"values": [1, 2, 3, 4], "key": 1}

The makeup of the JSON string can be complex as long as it follows these rules:

  • The top level of instance data must be a JSON object: a dictionary of key/value pairs.

  • Individual values in an instance object can be strings, numbers, or lists. You cannot embed JSON objects.

  • Lists must contain only items of the same type (including other lists). You may not mix string and numerical values.

You pass input instances for online prediction as the message body for the projects.predict call. Learn more about the request body's formatting requirements.

gcloud

  1. Ensure that your input file is a newline-delimited JSON file, with each instance as a JSON object, one instance per line.

    {"values": [1, 2, 3, 4], "key": 1}
    {"values": [5, 6, 7, 8], "key": 2}
    

REST API

  1. Make each instance an item in a list, and name the list member instances.

    {"instances": [
      {"values": [1, 2, 3, 4], "key": 1},
      {"values": [5, 6, 7, 8], "key": 2}
    ]}
    

Binary data in prediction input

Binary data cannot be formatted as the UTF-8 encoded strings that JSON supports. If you have binary data in your inputs, you must use base64 encoding to represent it. The following special formatting is required:

  • Your encoded string must be formatted as a JSON object with a single key named b64. The following Python 2.7 example encodes a buffer of raw JPEG data using the base64 library to make an instance:

    {"image_bytes": {"b64": base64.b64encode(jpeg_data)}}
    

    In Python 3.5, base64 encoding outputs a byte sequence. You must convert this to a string to make it JSON serializable:

    {'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}
    
  • In your TensorFlow model code, you must name the aliases for your binary input and output tensors so that they end with '_bytes'.

Requesting predictions

Request an online prediction by sending your input data instances as a JSON string in a predict request. For formatting of the request and response body, see the details of the prediction request.

If you don't specify a model version, your prediction request uses the default version of the model.

gcloud

  1. Create environment variables to hold the parameters, including a version value if you decide to specify a particular model version:

    MODEL_NAME="[YOUR-MODEL-NAME]"
    INPUT_DATA_FILE="instances.json"
    VERSION_NAME="[YOUR-VERSION-NAME]"
    
  2. Use gcloud ai-platform predict to send instances to a deployed model. Note that --version is optional.

    gcloud ai-platform predict --model $MODEL_NAME  \
                       --version $VERSION_NAME \
                       --json-instances $INPUT_DATA_FILE
    
  3. The gcloud tool parses the response and prints the predictions to your terminal in a human-readable format. You can specify a different output format, such as JSON or CSV, by using the --format flag with your predict command. See available output formats.

Python

You can use the Google APIs Client Library for Python to call the AI Platform Training and Prediction API without manually constructing HTTP requests. Before you run the following code sample, you must set up authentication.

def predict_json(project, model, instances, version=None):
    """Send json data to a deployed model for prediction.

    Args:
        project (str): project where the AI Platform Model is deployed.
        model (str): model name.
        instances ([Mapping[str: Any]]): Keys should be the names of Tensors
            your deployed model expects as inputs. Values should be datatypes
            convertible to Tensors, or (potentially nested) lists of datatypes
            convertible to tensors.
        version: str, version of the model to target.
    Returns:
        Mapping[str: any]: dictionary of prediction results defined by the
            model.
    """
    # Create the AI Platform service object.
    # To authenticate set the environment variable
    # GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
    service = googleapiclient.discovery.build('ml', 'v1')
    name = 'projects/{}/models/{}'.format(project, model)

    if version is not None:
        name += '/versions/{}'.format(version)

    response = service.projects().predict(
        name=name,
        body={'instances': instances}
    ).execute()

    if 'error' in response:
        raise RuntimeError(response['error'])

    return response['predictions']

Java

You can use the Google API Client Library for Java to call the AI Platform Training and Prediction API without manually constructing HTTP requests. Before you run the following code sample, you must set up authentication.

/*
 * Copyright 2017 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.http.FileContent;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpContent;
import com.google.api.client.http.HttpRequest;
import com.google.api.client.http.HttpRequestFactory;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.UriTemplate;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.discovery.Discovery;
import com.google.api.services.discovery.model.JsonSchema;
import com.google.api.services.discovery.model.RestDescription;
import com.google.api.services.discovery.model.RestMethod;
import java.io.File;

/*
 * Sample code for sending an online prediction request to Cloud Machine Learning Engine.
 */

public class OnlinePredictionSample {
  public static void main(String[] args) throws Exception {
    HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
    JsonFactory jsonFactory = JacksonFactory.getDefaultInstance();
    Discovery discovery = new Discovery.Builder(httpTransport, jsonFactory, null).build();

    RestDescription api = discovery.apis().getRest("ml", "v1").execute();
    RestMethod method = api.getResources().get("projects").getMethods().get("predict");

    JsonSchema param = new JsonSchema();
    String projectId = "YOUR_PROJECT_ID";
    // You should have already deployed a model and a version.
    // For reference, see https://cloud.google.com/ml-engine/docs/deploying-models.
    String modelId = "YOUR_MODEL_ID";
    String versionId = "YOUR_VERSION_ID";
    param.set(
        "name", String.format("projects/%s/models/%s/versions/%s", projectId, modelId, versionId));

    GenericUrl url =
        new GenericUrl(UriTemplate.expand(api.getBaseUrl() + method.getPath(), param, true));
    System.out.println(url);

    String contentType = "application/json";
    File requestBodyFile = new File("input.txt");
    HttpContent content = new FileContent(contentType, requestBodyFile);
    System.out.println(content.getLength());

    GoogleCredential credential = GoogleCredential.getApplicationDefault();
    HttpRequestFactory requestFactory = httpTransport.createRequestFactory(credential);
    HttpRequest request = requestFactory.buildRequest(method.getHttpMethod(), url, content);

    String response = request.execute().parseAsString();
    System.out.println(response);
  }
}

Troubleshooting online prediction

Common errors in online prediction include the following:

  • Out of memory errors
  • Input data is formatted incorrectly

Try reducing your model size before deploying it to AI Platform for prediction.

See more details on troubleshooting online prediction.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

AI Platform
Need help? Visit our support page.