Making batch predictions

After you have created (trained) a model, you can make an asynchronous prediction request for a batch of images using the batchPredict method. The batchPredict method applies labels to your image based on the primary object of the image that your model predicts.

The maximum lifespan for a custom model is 18 months as of the GA release. You must create and train a new model to continue annotating content after that time.

Batch prediction

You can request annotations (predictions) for images by using the batchPredict command. The batchPredict command takes, as input, a CSV file stored in your Google Cloud Storage bucket that contains the paths to the images to annotate. Each line specifies a separate path to an image in Google Cloud Storage.

batch_prediction.csv:

gs://my-cloud-storage-bucket/prediction_files/image1.jpg
gs://my-cloud-storage-bucket/prediction_files/image2.jpg
gs://my-cloud-storage-bucket/prediction_files/image3.jpg
gs://my-cloud-storage-bucket/prediction_files/image4.jpg
gs://my-cloud-storage-bucket/prediction_files/image5.jpg
gs://my-cloud-storage-bucket/prediction_files/image6.png

Depending on the number of images that you specified in your CSV file, the batch predict task can take some time to complete. Even on a small number of images batch prediction will take at minimum 30 minutes to complete.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • project-id: your GCP project ID.
  • location-id: A valid location identifier. Currently you must use the following value:
    • us-central1
  • model-id: the ID of your model, from the response when you created the model. The ID is the last element of the name of your model. For example:
    • model name: projects/project-id/locations/location-id/models/IOD4412217016962778756
    • model id: IOD4412217016962778756
  • input-storage-path: the path to a CSV file stored on Google Cloud Storage. The requesting user must have at least read permission to the bucket.
  • output-storage-bucket: a Google Cloud Storage bucket/directory to save output files to, expressed in the following form: gs://bucket/directory/. The requesting user must have write permission to the bucket.

Field-specific considerations:

  • params.score_threshold - A value between 0.0 and 1.0. Only results with scores greater or equal to this value will be returned.

HTTP method and URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict

Request JSON body:

{
  "inputConfig": {
    "gcsSource": {
       "inputUris": [ "input-storage-path" ]
    }
  },
  "outputConfig": {
    "gcsDestination": {
      "outputUriPrefix": "output-storage-bucket"
    }
  },
  "params": {
    "score_threshold": "0.0"
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models/model-id:batchPredict" | Select-Object -Expand Content
Response:

You should see output similar to the following:

{
  "name": "projects/project-id/locations/location-id/operations/ICN926615623331479552",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-06-19T21:28:35.302067Z",
    "updateTime": "2019-06-19T21:28:35.302067Z",
    "batchPredictDetails": {
      "inputConfig": {
        "gcsSource": {
          "inputUris": [
            "input-storage-path"
          ]
        }
      }
    }
  }
}

You can use the operation ID (ICN926615623331479552, in this case) to get the status of the task. For an example, see Working with long-running operations.

Depending on the number of images that you specified in your CSV file, the batch predict task can take some time to complete. Even on a small number of images batch prediction will take at minimum 30 minutes to complete.

Once the operation has completed, the state shows as DONE and your results are written to the Google Cloud Storage file you specified:

{
  "name": "projects/project-id/locations/location-id/operations/ICN926615623331479552",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-06-19T21:28:35.302067Z",
    "updateTime": "2019-06-19T21:57:18.310033Z",
    "batchPredictDetails": {
      "inputConfig": {
        "gcsSource": {
          "inputUris": [
            "input-storage-path"
          ]
        }
      },
      "outputInfo": {
        "gcsOutputDirectory": "gs://storage-bucket-vcm/subdirectory/prediction-8370559933346329705-YYYY-MM-DDThh:mm:ss.sssZ"
      }
    }
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.BatchPredictResult"
  }
}

See the Output JSONL files section below for a sample output file.

Java

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.BatchPredictInputConfig;
import com.google.cloud.automl.v1.BatchPredictOutputConfig;
import com.google.cloud.automl.v1.BatchPredictRequest;
import com.google.cloud.automl.v1.BatchPredictResult;
import com.google.cloud.automl.v1.GcsDestination;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.ModelName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.PredictionServiceClient;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class BatchPredict {

  static void batchPredict() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String modelId = "YOUR_MODEL_ID";
    String inputUri = "gs://YOUR_BUCKET_ID/path_to_your_input_csv_or_jsonl";
    String outputUri = "gs://YOUR_BUCKET_ID/path_to_save_results/";
    batchPredict(projectId, modelId, inputUri, outputUri);
  }

  static void batchPredict(String projectId, String modelId, String inputUri, String outputUri)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient client = PredictionServiceClient.create()) {
      // Get the full path of the model.
      ModelName name = ModelName.of(projectId, "us-central1", modelId);
      GcsSource gcsSource = GcsSource.newBuilder().addInputUris(inputUri).build();
      BatchPredictInputConfig inputConfig =
          BatchPredictInputConfig.newBuilder().setGcsSource(gcsSource).build();
      GcsDestination gcsDestination =
          GcsDestination.newBuilder().setOutputUriPrefix(outputUri).build();
      BatchPredictOutputConfig outputConfig =
          BatchPredictOutputConfig.newBuilder().setGcsDestination(gcsDestination).build();
      BatchPredictRequest request =
          BatchPredictRequest.newBuilder()
              .setName(name.toString())
              .setInputConfig(inputConfig)
              .setOutputConfig(outputConfig)
              .build();

      OperationFuture<BatchPredictResult, OperationMetadata> future =
          client.batchPredictAsync(request);

      System.out.println("Waiting for operation to complete...");
      BatchPredictResult response = future.get();
      System.out.println("Batch Prediction results saved to specified Cloud Storage bucket.");
    }
  }
}

Node.js

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const modelId = 'YOUR_MODEL_ID';
// const inputUri = 'gs://YOUR_BUCKET_ID/path_to_your_input_csv_or_jsonl';
// const outputUri = 'gs://YOUR_BUCKET_ID/path_to_save_results/';

// Imports the Google Cloud AutoML library
const {PredictionServiceClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new PredictionServiceClient();

async function batchPredict() {
  // Construct request
  const request = {
    name: client.modelPath(projectId, location, modelId),
    inputConfig: {
      gcsSource: {
        inputUris: [inputUri],
      },
    },
    outputConfig: {
      gcsDestination: {
        outputUriPrefix: outputUri,
      },
    },
  };

  const [operation] = await client.batchPredict(request);

  console.log('Waiting for operation to complete...');
  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(
    `Batch Prediction results saved to Cloud Storage bucket. ${response}`
  );
}

batchPredict();

PHP

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

use Google\Cloud\AutoMl\V1\BatchPredictInputConfig;
use Google\Cloud\AutoMl\V1\BatchPredictOutputConfig;
use Google\Cloud\AutoMl\V1\GcsSource;
use Google\Cloud\AutoMl\V1\GcsDestination;
use Google\Cloud\AutoMl\V1\PredictionServiceClient;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $modelId = 'my_model_id_123';
// $inputUri = 'gs://cloud-samples-data/text.txt';
// $outputUri = 'gs://YOUR_BUCKET_ID/path_to_store_results/';

$client = new PredictionServiceClient();

try {
    // get full path of model
    $formattedName = $client->modelName(
        $projectId,
        $location,
        $modelId
    );

    // set the multiple GCS uri
    $gcsSource = (new GcsSource())
        ->setInputUris($inputUri);
    $inputConfig = (new BatchPredictInputConfig())
        ->setGcsSource($gcsSource);
    $gcsDestination = (new GcsDestination())
        ->setOutputUriPrefix($outputUri);
    $outputConfig = (new BatchPredictOutputConfig())
        ->setGcsDestination($gcsDestination);
    // params is additional domain-specific parameters
    // score_threshold is used to filter the result
    $params = ['score_threshold' => '0.8']; // value between 0.0 and 1.0

    $operationResponse = $client->batchPredict(
        $formattedName,
        $inputConfig,
        $outputConfig,
        $params
    );

    printf('Waiting for operation to complete...' . PHP_EOL);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();
        printf('Batch Prediction results saved to Cloud Storage bucket. %s' . PHP_EOL, $result);
    } else {
        $error = $operationResponse->getError();
        print($error->getMessage());
    }
} finally {
    $client->close();
}

Python

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# model_id = "YOUR_MODEL_ID"
# input_uri = "gs://YOUR_BUCKET_ID/path/to/your/input/csv_or_jsonl"
# output_uri = "gs://YOUR_BUCKET_ID/path/to/save/results/"

prediction_client = automl.PredictionServiceClient()

# Get the full path of the model.
model_full_id = f"projects/{project_id}/locations/us-central1/models/{model_id}"

gcs_source = automl.GcsSource(input_uris=[input_uri])

input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
    gcs_destination=gcs_destination
)

response = prediction_client.batch_predict(
    name=model_full_id,
    input_config=input_config,
    output_config=output_config
)

print("Waiting for operation to complete...")
print(
    f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)

Ruby

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

require "google/cloud/automl"

project_id = "YOUR_PROJECT_ID"
model_id = "YOUR_MODEL_ID"
input_uri = "gs://YOUR_BUCKET_ID/path_to_your_input_file.csv"
output_uri = "gs://YOUR_BUCKET_ID/path_to_save_results/"

prediction_client = Google::Cloud::AutoML.prediction_service

# Get the full path of the model.
model_full_id = prediction_client.model_path project: project_id,
                                             location: "us-central1",
                                             model: model_id
input_config = {
  gcs_source: {
    input_uris: [input_uri]
  }
}
output_config = {
  gcs_destination: {
    output_uri_prefix: output_uri
  }
}
# [0.0-1.0] Only produce results higher than this value
params = { "score_threshold" => "0.8" }

operation = prediction_client.batch_predict name: model_full_id,
                                            input_config: input_config,
                                            output_config: output_config,
                                            params: params

puts "Waiting for operation to complete..."

# Wait until the long running operation is done
operation.wait_until_done!

puts "Batch Prediction results saved to Cloud Storage bucket."

Output JSONL files

When the batch predict task is complete, the output of the prediction is stored in the Google Cloud Storage bucket that you specified in your command.

In your output bucket (if applicable, in your specified directory) the files image_classification_1.jsonl, image_classification_2.jsonl,...,image_classification_N.jsonl will be created, where N may be 1, and depends on the total number of the successfully predicted images and annotations.

A single image will be listed only once with all its annotations, and its annotations will never be split across files.

Each JSONL file will contain, per line, a JSON representation of a proto that wraps image's "ID" : "<id_value>" followed by a list of zero or more AnnotationPayload protos (called annotations), which have classification detail populated.

Example JSONL file:

image_image_classification_0.jsonl - A single .jsonl file with 4 lines, each line corresponding to an image file annotation JSON.

Working with long-running operations

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • project-id: your GCP project ID.
  • operation-id: the ID of your operation. The ID is the last element of the name of your operation. For example:
    • operation name: projects/project-id/locations/location-id/operations/IOD5281059901324392598
    • operation id: IOD5281059901324392598

HTTP method and URL:

GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id" | Select-Object -Expand Content
You should see output similar to the following for a completed import operation:
{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T16:10:41.326614Z",
    "importDataDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

You should see output similar to the following for a completed create model operation:

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-07-22T18:35:06.881193Z",
    "updateTime": "2019-07-22T19:58:44.972235Z",
    "createModelDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Model",
    "name": "projects/project-id/locations/us-central1/models/model-id"
  }
}

Go

Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"google.golang.org/genproto/googleapis/longrunning"
)

// getOperationStatus gets an operation's status.
func getOperationStatus(w io.Writer, projectID string, location string, operationID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// operationID := "TRL123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &longrunning.GetOperationRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/operations/%s", projectID, location, operationID),
	}

	op, err := client.LROClient.GetOperation(ctx, req)
	if err != nil {
		return fmt.Errorf("GetOperation: %v", err)
	}

	fmt.Fprintf(w, "Name: %v\n", op.GetName())
	fmt.Fprintf(w, "Operation details:\n")
	fmt.Fprintf(w, "%v", op)

	return nil
}

Java

Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.longrunning.Operation;
import java.io.IOException;

class GetOperationStatus {

  static void getOperationStatus() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String operationFullId = "projects/[projectId]/locations/us-central1/operations/[operationId]";
    getOperationStatus(operationFullId);
  }

  // Get the status of an operation
  static void getOperationStatus(String operationFullId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the latest state of a long-running operation.
      Operation operation = client.getOperationsClient().getOperation(operationFullId);

      // Display operation details.
      System.out.println("Operation details:");
      System.out.format("\tName: %s\n", operation.getName());
      System.out.format("\tMetadata Type Url: %s\n", operation.getMetadata().getTypeUrl());
      System.out.format("\tDone: %s\n", operation.getDone());
      if (operation.hasResponse()) {
        System.out.format("\tResponse Type Url: %s\n", operation.getResponse().getTypeUrl());
      }
      if (operation.hasError()) {
        System.out.println("\tResponse:");
        System.out.format("\t\tError code: %s\n", operation.getError().getCode());
        System.out.format("\t\tError message: %s\n", operation.getError().getMessage());
      }
    }
  }
}

Node.js

Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const operationId = 'YOUR_OPERATION_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function getOperationStatus() {
  // Construct request
  const request = {
    name: `projects/${projectId}/locations/${location}/operations/${operationId}`,
  };

  const [response] = await client.operationsClient.getOperation(request);

  console.log(`Name: ${response.name}`);
  console.log('Operation details:');
  console.log(`${response}`);
}

getOperationStatus();

PHP

Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.

use Google\ApiCore\LongRunning\OperationsClient;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $operationId = 'my_operation_id_123';

$client = new OperationsClient();

try {
    // full name of operation
    $formattedName = 'projects/' . $projectId . '/locations/us-central1/operations/' . $operationId;

    // get latest state of long running operation
    $operation = $client->getOperation($name);
    printf('Operation name: %s' . PHP_EOL, $operation->getName());
    print('Operation details: ');
    print($operation);
} finally {
    if (isset($client)) {
        $client->close();
    }
}

Python

Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# operation_full_id = \
#     "projects/[projectId]/locations/us-central1/operations/[operationId]"

client = automl.AutoMlClient()
# Get the latest state of a long-running operation.
response = client._transport.operations_client.get_operation(
    operation_full_id
)

print("Name: {}".format(response.name))
print("Operation details:")
print(response)