Getting Online Predictions

Cloud Machine Learning Engine online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response. You can learn more about online prediction with the other prediction concepts.

Before you begin

In order to request predictions, you must first:


Cloud ML Engine online prediction is currently available in the following regions:

  • us-central1
  • us-east1
  • asia-northeast1
  • europe-west1

Requesting predictions

You request online predictions by sending your input data instances, in the required format in a JSON string, to projects.predict. Your only decision to make is whether to use the default version of the model or to specify a particular model version.


  1. Create environment variables to hold the parameters, including a version value if you decide to specify a specific model version:

  2. Use gcloud ml-engine predict to send instances to a deployed model. Note that --version is optional.

    gcloud ml-engine predict --model $MODEL_NAME  \
                       --version $VERSION_NAME \
                       --json-instances $INPUT_DATA_FILE
  3. The gcloud tool parses the response and prints the predictions to your terminal.


This sample assumes that you are familiar with the Google Cloud Client library for Python. If you aren't familiar with it, see Using the Python Client Library.

def predict_json(project, model, instances, version=None):
    """Send json data to a deployed model for prediction.

        project (str): project where the Cloud ML Engine Model is deployed.
        model (str): model name.
        instances ([Mapping[str: Any]]): Keys should be the names of Tensors
            your deployed model expects as inputs. Values should be datatypes
            convertible to Tensors, or (potentially nested) lists of datatypes
            convertible to tensors.
        version: str, version of the model to target.
        Mapping[str: any]: dictionary of prediction results defined by the
    # Create the ML Engine service object.
    # To authenticate set the environment variable
    # GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_file>
    service ='ml', 'v1')
    name = 'projects/{}/models/{}'.format(project, model)

    if version is not None:
        name += '/versions/{}'.format(version)

    response = service.projects().predict(
        body={'instances': instances}

    if 'error' in response:
        raise RuntimeError(response['error'])

    return response['predictions']

Requesting logs for online prediction requests

The Cloud ML Engine prediction service doesn't provide logged information about requests by default. However, you can configure your model to generate logs when you create the model resource.

What's next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)