Getting Batch Predictions

When you don't need your predictions right away, or when you have a large number of instances to get predictions for, you can use the batch prediction service. This page describes how to start Cloud Machine Learning Engine batch prediction jobs. You can learn about the differences between online and batch prediction in the prediction overview.

Before you begin

In order to request predictions, you must first:

  • Create a model resource and version with Cloud ML Engine or put a TensorFlow SavedModel in a Google Cloud Storage location that your project can access.

  • Set up a Google Cloud Storage location that your project has access to for:

    • Input data files. This can be multiple locations, and your project must be authorized to read from each.

    • Output files. You can only specify one output path and your project must be authorized to write data to it.

  • Verify that your input file is in the correct format for batch prediction.

Configuring a batch prediction job

To start your batch prediction job, you'll need to gather some configuration data. This is the same data that is contained in the PredictionInput object you use when calling the API directly:

Data format

The type of input format you use for your input files. All of your input files for a given job must use the same data format. Set to one of these values:

Your input files are plain text with an instance on each line. This is the format described on the prediction concepts page.
Your input files use the TensorFlow TFRecords format.
Your input files are GZIP-compressed TFRecords files.
Input paths

The URIs of your input data files, which must be in Google Cloud Storage locations. You can specify:

  • Paths to specific files: 'gs://path/to/my/input/file.json'.

  • Paths to directories with a single asterisk wildcard, to indicate all files in that directory: 'gs://path/to/my/input/*'.

  • Paths to partial filenames with a single asterisk wildcard at the end, to indicate all files that start with the provided sequence: 'gs://path/to/my/input/file*'.

You can combine multiple URIs. In Python you make a list of them. If you use the gcloud command-line tool, or call the API directly, you can list multiple URIs, separated by commas, but with no space in between them. This is the right format for the --input-paths flag:

 --input-paths gs://a/directory/of/files/*,gs://a/single/specific/file.json,gs://a/file/template/data*
Output path

The path to the Google Cloud Storage location where you want the prediction service to save your results. Your project must have permissions to write to this location.

Model name and version name

The name of the model and, optionally, the version you want to get predictions from. If you don't specify a version, the model's default version is used. You can use the Google Cloud Storage path to an undeployed SavedModel, called the model URI, instead if you like.

Model URI

You can get predictions from a model that isn't deployed on Cloud ML Engine by specifying the URI of the SavedModel you want to use. The SavedModel must be stored on Google Cloud Storage.

To summarize, you have three options for specifying the model to use for batch prediction. You can use:

  • The model name by itself to use the model's default version.

  • The model and version names to use a specific model version.

  • The model URI to use a SavedModel that is on Google Cloud Storage, but not deployed to Cloud ML Engine.


The Google Compute Engine region where you want to run your job. You'll get the best results if you run your prediction job and store your input and output data all in the same region, especially for very large datasets. Cloud ML Engine batch prediction is only available in a subset of regions:

  • us-central1
  • us-east1
  • europe-west1
  • asia-east1
Job name

A name for your job, which must: - Contain only mixed-case (case sensitive) letters, digits, and underscores.

  • Start with a letter.

  • Contain no more than 128 characters.

  • Be unique among all training and batch prediction job names ever used in your project. This includes all jobs that you created in your project, regardless of their success or status.
Maximum worker count (optional)

The maximum number of prediction nodes to use in the processing cluster for this job. This is your way to put an upper limit on the automatic scaling feature of batch prediction. If you don't specify a value, it defaults to 10. Regardless of the value you specify, scaling is limited by the prediction node quota.

Runtime version (optional)

The Cloud ML Engine version to use for the job. This option is included so that you can specify a runtime version to use with models that aren't deployed on Cloud ML Engine. You should always omit this value for deployed model versions, which signals the service to use the same version that was specified when the model version was deployed.

The following examples define variables to hold configuration data.


It isn't necessary to create variables when using the gcloud command-line tool to start a job. However, doing so here makes the job submission command much easier to enter and read.

now=$(date +"%Y%m%d_%H%M%S")


When you use the Google Cloud Client Library, you can use Python dictionaries to represent the Job and PredictionInput resources.

  1. Format your project name and your model or version name with the syntax used by the Cloud ML Engine REST APIs:

    • project_name -> 'projects/project_name'
    • model_name -> 'projects/project_name/models/model_name'
    • version_name -> 'projects/project_name/models/model_name/versions/version_name'
  2. Create a dictionary for the Job resource and populate it with two items:

    • A key named 'jobId' with the job name you want to use as its value.

    • A key named 'predictionInput' that contains another dictionary object housing all of the required members of PredictionInput, and any optional members that you want to use.

    The following example shows a function that takes the configuration information as input variables and returns the prediction request body. In addition to the basics, the example also generates a unique job identifier based on your project name, model name, and the current time.

    import time
    import re
    def make_batch_job_body(project, input_paths, output_path,
            model_name, region, data_format='TEXT',
            version_name=None, max_worker_count=None,
        project_id = 'projects/{}'.format(project_name)
        model_id = '{}/models/{}'.format(project_id, model_name)
        if version_name:
            version_id = '{}/versions/{}'.format(model_id, version_name)
        # Make a jobName of the format "model_name_batch_predict_YYYYMMDD_HHMMSS"
        timestamp = time.strftime('%Y%m%d_%H%M%S', time.gmtime())
        # Make sure the project name is formatted correctly to work as the basis
        # of a valid job name.
        clean_project_name = re.sub(r'\W+', '_', project_name
        job_id = '{}_{}_{}'.format(clean_project_name, model_name,
        # Start building the request dictionary with required information.
        body = {'jobId': job_id,
                'predictionInput': {
                    'dataFormat': data_format,
                    'inputPaths': input_paths,
                    'outputPath': output_path,
                    'region': region}}
        # Use the version if present, the model (its default version) if not.
        if version_name:
            body['predictionInput']['versionName'] = version_id
            body['predictionInput']['modelName'] = model_id
        # Only include a maximum number of workers or a runtime version if specified.
        # Otherwise let the service use its defaults.
        if maxWorkers:
            body['predictionInput']['maxWorkerCount'] = max_worker_count
        if runtime_version:
            body['predictionInput']['runtimeVersion'] = runtime_version
        return body

Submitting a batch prediction job

Submitting your job is a simple call to or its command-line tool equivalent, gcloud ml-engine jobs submit prediction.


The following example uses the variables defined in the previous section to start batch prediction.

gcloud ml-engine jobs submit prediction $JOB_NAME \
    --model $MODEL_NAME \
    --input-paths $INPUT_PATHS \
    --output-path $OUTPUT_PATH \
    --region $REGION \
    --data-format $DATA_FORMAT


Starting a batch prediction job with the Google Cloud Platform Client SDK follows a similar pattern to other client SDK procedures:

  1. Prepare the request body to use for the call (this is shown in the previous section).

  2. Form the request by calling

  3. Call execute on the request to get a response, making sure to check for HTTP errors.

  4. Use the response as a dictionary to get values from the Job resource.

    project_id = 'projects/{}'.format(project_name)
    request = ml.projects().jobs().create(parent=project_id,
        response = request.execute()
        print('Job requested.')
        # The state returned will almost always be QUEUED.
        print('state : {}'.format(response['state']))
    except errors.HttpError as err:
        # Something went wrong, print out some information.
        print('There was an error getting the prediction results.' +
              'Check the details:')

Monitoring your batch prediction job

A batch prediction job can take a long time to finish. You can monitor your job's progress using Google Cloud Platform Console:

  1. Go to the Cloud ML Engine Jobs page in the Google Cloud Platform Console:

    Go to the Cloud Platform Console Jobs page

  2. Click on your job's name in the Job ID list. This opens the Job details page.

  3. The current status is shown with the job name at the top of the page.

  4. If you want more details, you can click View logs to see your job's entry in Stackdriver Logging.

There are other ways to track the progress of your batch prediction job. They follow the same patterns as monitoring training jobs. You'll find more information on the page describing how to monitor your training jobs. You may need to adjust the instructions there slightly to work with prediction jobs, but the mechanisms are the same.

Getting prediction results

The service writes predictions to the Google Cloud Storage location you specify. There are two types of files output that might include interesting results:

  • Files named prediction.errors_stats-NNNNN-of-NNNNN contain information about any problems encountered during the job.

  • Files named prediction.results-NNNNN-of-NNNNN contain the predictions themselves, as defined by your model's output.

The filenames include index numbers (shown above as an 'N' for each digit) that capture how many file in total you should find. For example a job that has six results files includes prediction.results-00000-of-00006 through prediction.results-00005-of-00006.

Prediction results are formatted as JSON objects in text files. You can open them with your choice of text editors. For a quick look on the command line you can use gsutil cat:

gsutil cat $OUTPUT_PATH/prediction.results-NNNNN-of-NNNNN|less

Remember that your prediction results are not typically output in the same order as your input instances, even if you use only a single input file. You can find the prediction for an instance by matching the instance keys.

What's Next

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)