Getting batch predictions

When you don't need your predictions right away, or when you have a large number of instances to get predictions for, you can use the batch prediction service. This page describes how to start AI Platform Prediction batch prediction jobs. AI Platform Prediction only supports getting batch predictions from TensorFlow models.

Learn about online versus batch prediction or read an overview of prediction concepts.

Before you begin

In order to request predictions, you must first:

  • Create a model resource and a version resource or put a TensorFlow SavedModel in a Cloud Storage location that your project can access.

    • If you choose to use a version resource for batch prediction, you must create the version with the mls1-c1-m2 machine type.
  • Set up a Cloud Storage location that your project has access to for:

    • Input data files. This can be multiple locations, and your project must be authorized to read from each.

    • Output files. You can only specify one output path and your project must be authorized to write data to it.

  • Verify that your input file is in the correct format for batch prediction.

Configuring a batch prediction job

To start your batch prediction job, you'll need to gather some configuration data. This is the same data that is contained in the PredictionInput object you use when calling the API directly:

Data format

The type of input format you use for your input files. All of your input files for a given job must use the same data format. Set to one of these values:


Your input files are plain text with an instance on each line. This is the format described on the prediction concepts page.


Your input files use the TensorFlow TFRecord format.


Your input files are GZIP-compressed TFRecord files.

Input paths

The URIs of your input data files, which must be in Cloud Storage locations. You can specify:

  • Paths to specific files: 'gs://path/to/my/input/file.json'.

  • Paths to directories with a single asterisk wildcard, to indicate all files in that directory: 'gs://path/to/my/input/*'.

  • Paths to partial filenames with a single asterisk wildcard at the end, to indicate all files that start with the provided sequence: 'gs://path/to/my/input/file*'.

You can combine multiple URIs. In Python you make a list of them. If you use the Google Cloud CLI, or call the API directly, you can list multiple URIs, separated by commas, but with no space in between them. This is the right format for the --input-paths flag:

 --input-paths gs://a/directory/of/files/*,gs://a/single/specific/file.json,gs://a/file/template/data*
Output path

The path to the Cloud Storage location where you want the prediction service to save your results. Your project must have permissions to write to this location.

Model name and version name

The name of the model and, optionally, the version you want to get predictions from. If you don't specify a version, the model's default version is used. For batch prediction, the version must use the mls1-c1-m2 machine type.

If you provide a Model URI (see the following section), omit these fields.

Model URI

You can get predictions from a model that isn't deployed on AI Platform Prediction by specifying the URI of the SavedModel you want to use. The SavedModel must be stored on Cloud Storage.

To summarize, you have three options for specifying the model to use for batch prediction. You can use:

  • The model name by itself to use the model's default version.

  • The model and version names to use a specific model version.

  • The model URI to use a SavedModel that is on Cloud Storage, but not deployed to AI Platform Prediction.


The Google Compute Engine region where you want to run your job. For best performance, you should run your prediction job and store your input and output data in the same region, especially for very large datasets. AI Platform Prediction batch prediction is available in the following regions:

  -   us-central1
  -   us-east1
  -   europe-west1
  -   asia-east1

To fully understand the available regions for AI Platform Prediction services, including model training and online prediction, read the guide to regions.

Job name

A name for your job, which must:

  • Contain only mixed-case (case sensitive) letters, digits, and underscores.
  • Start with a letter.
  • Contain no more than 128 characters.
  • Be unique among all training and batch prediction job names ever used in your project. This includes all jobs that you created in your project, regardless of their success or status.
Batch size (optional)

The number of records per batch. The service will buffer batch_size number of records in memory before invoking your model. Defaults to 64 if not specified.

Labels (optional)

You can add labels to your job to organize and sort jobs into categories when viewing or monitoring resources. For example, you could sort jobs by team (by adding labels like engineering or research) or by development phase (prod or test). To add labels to your prediction job, provide a list of KEY=VALUE pairs.

Maximum worker count (optional)

The maximum number of prediction nodes to use in the processing cluster for this job. This is your way to put an upper limit on the automatic scaling feature of batch prediction. If you don't specify a value, it defaults to 10. Regardless of the value you specify, scaling is limited by the prediction node quota.

Runtime version (optional)

The AI Platform Prediction version to use for the job. This option is included so that you can specify a runtime version to use with models that aren't deployed on AI Platform Prediction. You should always omit this value for deployed model versions, which signals the service to use the same version that was specified when the model version was deployed.

Signature name (optional)

If your saved model has multiple signatures, use this option to specify a custom TensorFlow signature name, which allows you to select an alternative input/output map defined in the TensorFlow SavedModel. See the TensorFlow documentation on SavedModel for a guide to using signatures, and the guide to specifying the outputs of a custom model. The default is DEFAULT_SERVING_SIGNATURE_DEF_KEY, which has the value serving_default.

The following examples define variables to hold configuration data.


It isn't necessary to create variables when using the gcloud command-line tool to start a job. However, doing so here makes the job submission command much easier to enter and read.

DATA_FORMAT="text" # JSON data format
now=$(date +"%Y%m%d_%H%M%S")


When you use the Google API Client Library for Python, you can use Python dictionaries to represent the Job and PredictionInput resources.

  1. Format your project name and your model or version name with the syntax used by the AI Platform Prediction REST APIs:

    • project_name -> 'projects/project_name'
    • model_name -> 'projects/project_name/models/model_name'
    • version_name -> 'projects/project_name/models/model_name/versions/version_name'
  2. Create a dictionary for the Job resource and populate it with two items:

    • A key named 'jobId' with the job name you want to use as its value.

    • A key named 'predictionInput' that contains another dictionary object housing all of the required members of PredictionInput, and any optional members that you want to use.

    The following example shows a function that takes the configuration information as input variables and returns the prediction request body. In addition to the basics, the example also generates a unique job identifier based on your project name, model name, and the current time.

    import time
    import re
    def make_batch_job_body(project_name, input_paths, output_path,
            model_name, region, data_format='JSON',
            version_name=None, max_worker_count=None,
        project_id = 'projects/{}'.format(project_name)
        model_id = '{}/models/{}'.format(project_id, model_name)
        if version_name:
            version_id = '{}/versions/{}'.format(model_id, version_name)
        # Make a jobName of the format "model_name_batch_predict_YYYYMMDD_HHMMSS"
        timestamp = time.strftime('%Y%m%d_%H%M%S', time.gmtime())
        # Make sure the project name is formatted correctly to work as the basis
        # of a valid job name.
        clean_project_name = re.sub(r'\W+', '_', project_name)
        job_id = '{}_{}_{}'.format(clean_project_name, model_name,
        # Start building the request dictionary with required information.
        body = {'jobId': job_id,
                'predictionInput': {
                    'dataFormat': data_format,
                    'inputPaths': input_paths,
                    'outputPath': output_path,
                    'region': region}}
        # Use the version if present, the model (its default version) if not.
        if version_name:
            body['predictionInput']['versionName'] = version_id
            body['predictionInput']['modelName'] = model_id
        # Only include a maximum number of workers or a runtime version if specified.
        # Otherwise let the service use its defaults.
        if max_worker_count:
            body['predictionInput']['maxWorkerCount'] = max_worker_count
        if runtime_version:
            body['predictionInput']['runtimeVersion'] = runtime_version
        return body

Submitting a batch prediction job

Submitting your job is a simple call to or its command-line tool equivalent, gcloud ai-platform jobs submit prediction.


The following example uses the variables defined in the previous section to start batch prediction.

gcloud ai-platform jobs submit prediction $JOB_NAME \
    --model $MODEL_NAME \
    --input-paths $INPUT_PATHS \
    --output-path $OUTPUT_PATH \
    --region $REGION \
    --data-format $DATA_FORMAT


Starting a batch prediction job with the Google API Client Library for Python follows a similar pattern to other client SDK procedures:

  1. Prepare the request body to use for the call (this is shown in the previous section).

  2. Form the request by calling

  3. Call execute on the request to get a response, making sure to check for HTTP errors.

  4. Use the response as a dictionary to get values from the Job resource.

You can use the Google API Client Library for Python to call the AI Platform Training and Prediction API without manually constructing HTTP requests. Before you run the following code sample, you must set up authentication.

    import googleapiclient.discovery as discovery

    project_id = 'projects/{}'.format(project_name)

    ml ='ml', 'v1')
    request = ml.projects().jobs().create(parent=project_id,

        response = request.execute()

        print('Job requested.')

        # The state returned will almost always be QUEUED.
        print('state : {}'.format(response['state']))

    except errors.HttpError as err:
        # Something went wrong, print out some information.
        print('There was an error getting the prediction results.' +
              'Check the details:')

Monitoring your batch prediction job

A batch prediction job can take a long time to finish. You can monitor your job's progress using Google Cloud console:

  1. Go to the AI Platform Prediction Jobs page in the Google Cloud console:

    Go to the Google Cloud console Jobs page

  2. Click on your job's name in the Job ID list. This opens the Job details page.

  3. The current status is shown with the job name at the top of the page.

  4. If you want more details, you can click View logs to see your job's entry in Cloud Logging.

There are other ways to track the progress of your batch prediction job. They follow the same patterns as monitoring training jobs. You'll find more information on the page describing how to monitor your training jobs. You may need to adjust the instructions there slightly to work with prediction jobs, but the mechanisms are the same.

Getting prediction results

The service writes predictions to the Cloud Storage location you specify. There are two types of files output that might include interesting results:

  • Files named prediction.errors_stats-NNNNN-of-NNNNN contain information about any problems encountered during the job.

  • JSON Lines files named prediction.results-NNNNN-of-NNNNN contain the predictions themselves, as defined by your model's output.

The filenames include index numbers (shown above as an 'N' for each digit) that capture how many file in total you should find. For example a job that has six results files includes prediction.results-00000-of-00006 through prediction.results-00005-of-00006.

Every line of each prediction file is a JSON object representing a single prediction result. You can open the prediction files with your choice of text editor. For a quick look on the command line you can use gsutil cat:

gsutil cat $OUTPUT_PATH/prediction.results-NNNNN-of-NNNNN|less

Remember that your prediction results are not typically output in the same order as your input instances, even if you use only a single input file. You can find the prediction for an instance by matching the instance keys.

What's Next