Getting Online Predictions

Cloud Machine Learning Engine online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response.

Learn about online versus batch prediction or read an overview of prediction concepts.

Before you begin

In order to request predictions, you must first:

  • Ensure that the file size of your SavedModel is under the Cloud ML Engine default limit of 250 MB by optimizing your graph for prediction.

  • Verify that your input data is in the correct format for online prediction.


Cloud ML Engine online prediction is currently available in the following regions:

  • us-central1
  • europe-west1
  • us-east1
  • asia-northeast1

To fully understand the available regions for Cloud ML Engine training and prediction services, read the guide to regions.

Creating models and versions

You make the following important decisions about how to run online prediction when creating the model and version resources:

Resource created Decision specified at resource creation
Model Region in which to run predictions
Model Enable online prediction logging
Version Runtime version to use
Version Python version to use
Version Machine type to use for online prediction

You can't update the settings listed above after the initial creation of the model or version. If you need to change these settings, create a new model or version resource with the new settings and redeploy your model.

Machine types available for online prediction

Online prediction currently supports single core CPUs with 2 GB of RAM. If you are interested in joining alpha programs for other hardware, contact Cloud ML Engine feedback .

See information about pricing for these machine types.

Requesting logs for online prediction requests

The Cloud ML Engine prediction service does not provide logged information about requests by default, because the logs incur cost. Online prediction at a high rate of queries per second (QPS) can produce a substantial number of logs, which are subject to the Stackdriver pricing policy.

To opt in to online prediction logging, you can configure your model to generate logs when you create the model resource.


Include the --enable-logging flag when creating your model with the gcloud ml-engine models create command.


When creating your model with projects.models.create, set onlinePredictionLogging to True in the Model resource.

Formatting your input for online prediction

Formatting instances as JSON strings

The basic format for online prediction is a list of instance data tensors. These can be either plain lists of values or members of a JSON object, depending on how you configured your inputs in your training application.

This example shows an input tensor and an instance key:

{"values": [1, 2, 3, 4], "key": 1}

The makeup of the JSON string can be complex as long as it follows these rules:

  • The top level of instance data must be a JSON object—a dictionary of key/value pairs.

  • Individual values in an instance object can be strings, numbers, or lists. You cannot embed JSON objects.

  • Lists must contain only items of the same type (including other lists). You may not mix string and numerical values.

You pass input instances for online prediction as the message body for the projects.predict call.


  1. Ensure that your input file is a text file with each instance as a JSON object, one instance per line.

    {"values": [1, 2, 3, 4], "key": 1}
    {"values": [5, 6, 7, 8], "key": 2}


  1. Make each instance an item in a list, and name the list member instances.

    {"instances": [{"values": [1, 2, 3, 4], "key": 1}]}

Binary data in prediction input

Binary data cannot be formatted as the UTF-8 encoded strings that JSON supports. If you have binary data in your inputs, you must use base64 encoding to represent it. The following special formatting is required:

  • Your encoded string must be formatted as a JSON object with a single key named b64. The following Python example encodes a buffer of raw JPEG data using the base64 library to make an instance:

    {"image_bytes":{"b64": base64.b64encode(jpeg_data)}}
  • In your TensorFlow model code, you must name the aliases for your binary input and output tensors so that they end with '_bytes'.

Requesting predictions

Request an online prediction by sending your input data instances as a JSON string in a predict request. For formatting of the request and response body, see the details of the prediction request.

If you don't specify a model version, your prediction request uses the default version of the model.


  1. Create environment variables to hold the parameters, including a version value if you decide to specify a particular model version:

  2. Use gcloud ml-engine predict to send instances to a deployed model. Note that --version is optional.

    gcloud ml-engine predict --model $MODEL_NAME  \
                       --version $VERSION_NAME \
                       --json-instances $INPUT_DATA_FILE
  3. The gcloud tool parses the response and prints the predictions to your terminal in a human-readable format. You can specify a different output format, such as JSON or CSV, by using the --format flag with your predict command. See available output formats.


This sample assumes that you are familiar with the Google Cloud Client library for Python. If you aren't familiar with it, see Using the Python Client Library.

def predict_json(project, model, instances, version=None):
    """Send json data to a deployed model for prediction.

        project (str): project where the Cloud ML Engine Model is deployed.
        model (str): model name.
        instances ([Mapping[str: Any]]): Keys should be the names of Tensors
            your deployed model expects as inputs. Values should be datatypes
            convertible to Tensors, or (potentially nested) lists of datatypes
            convertible to tensors.
        version: str, version of the model to target.
        Mapping[str: any]: dictionary of prediction results defined by the
    # Create the ML Engine service object.
    # To authenticate set the environment variable
    # GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
    service ='ml', 'v1')
    name = 'projects/{}/models/{}'.format(project, model)

    if version is not None:
        name += '/versions/{}'.format(version)

    response = service.projects().predict(
        body={'instances': instances}

    if 'error' in response:
        raise RuntimeError(response['error'])

    return response['predictions']


 * Copyright 2017 Google Inc.
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * See the License for the specific language governing permissions and
 * limitations under the License.


 * Sample code for doing Cloud Machine Learning Engine online prediction in Java.

public class OnlinePredictionSample {
  public static void main(String[] args) throws Exception {
    HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
    JsonFactory jsonFactory = JacksonFactory.getDefaultInstance();
    Discovery discovery = new Discovery.Builder(httpTransport, jsonFactory, null).build();

    RestDescription api = discovery.apis().getRest("ml", "v1").execute();
    RestMethod method = api.getResources().get("projects").getMethods().get("predict");

    JsonSchema param = new JsonSchema();
    String projectId = "YOUR_PROJECT_ID";
    // You should have already deployed a model and a version.
    // For reference, see
    String modelId = "YOUR_MODEL_ID";
    String versionId = "YOUR_VERSION_ID";
        "name", String.format("projects/%s/models/%s/versions/%s", projectId, modelId, versionId));

    GenericUrl url =
        new GenericUrl(UriTemplate.expand(api.getBaseUrl() + method.getPath(), param, true));

    String contentType = "application/json";
    File requestBodyFile = new File("input.txt");
    HttpContent content = new FileContent(contentType, requestBodyFile);

    GoogleCredential credential = GoogleCredential.getApplicationDefault();
    HttpRequestFactory requestFactory = httpTransport.createRequestFactory(credential);
    HttpRequest request = requestFactory.buildRequest(method.getHttpMethod(), url, content);

    String response = request.execute().parseAsString();

Troubleshooting online prediction

Common errors in online prediction include the following:

  • Out of memory errors
  • Input data is formatted incorrectly

Try reducing your model size before deploying it to Cloud ML Engine for prediction.

See more details on troubleshooting online prediction.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud ML Engine for TensorFlow