Using a custom container

To customize how AI Platform Prediction serves online predictions from your trained machine learning (ML) model, you can specify a custom container instead of a runtime version when you create a model version. When you use a custom container, AI Platform Prediction runs a Docker container of your choice on each prediction node instead of running the standard runtime version code to serve predictions from compatible model artifacts.

You might want to use a custom container for any of the following reasons:

  • to serve predictions from an ML model trained using a framework other than TensorFlow, scikit-learn, or XGBoost
  • to preprocess prediction requests or postprocess the predictions generated by your model
  • to run a prediction server written in a programming language of your choice
  • to install dependencies that you want to use to customize prediction

This guide describes how to create a model version that uses a custom container. It does not provide detailed instructions about designing and creating a Docker container image. To walk through an example of creating a container image and using it with AI Platform Prediction, read Getting started: Serving PyTorch predictions with a custom container.

To use a custom container, you must use a regional endpoint and a Compute Engine (N1) machine type for your model version.

Preparing a container image

To create a model version that uses a custom container, you must provide a Docker container image as the basis of that container. This container image must meet the requirements described in Custom container requirements.

If you plan to use an existing container image created by a third party that you trust, then you might be able to skip one or both of the following sections.

Create a container image

Design and build a Docker container image that meets the container image requirements.

To learn the basics of designing and building a Docker container image, read the Docker documentation's quickstart

Push the container image to Artifact Registry

Push your container image to an Artifact Registry repository that meets the container image publishing requirements.

Learn how to push a container image to Artifact Registry.

Creating a model and model version

Specify several configuration options when you create a model to ensure that any model versions that you later create on the model are compatible with your custom container.

Then, specify most of the container configuration when you create a model version.

Create a model

To create a model, follow the instructions for creating a model resource. You must create the model on a regional endpoint that matches the region of the Artifact Registry repository where your container image is stored. To learn more, read the container image publishing requirements.

Create a model version

When you create a model version that uses a custom container, configure the following container-specific API fields in addition to the other fields that you specify for a model version:

The following sections describe how to configure these fields.

In addition, note the following container-specific differences in how you configure other API fields:

Configure Version.container

You must specify a ContainerSpec message in the Version.container field. Within this message, you can specify the following subfields. If you use the gcloud beta ai-platform versions create command to create your model version, then you can use a command-line flag to specify each subfield.

image (required)

The Artifact Registry URI of your container image.

gcloud CLI flag: --image

command (optional)

An array of an executable and arguments to override the container's ENTRYPOINT. To learn more about how to format this field and how it interacts with the args field, read the API reference for ContainerSpec.

gcloud CLI flag: --command

args (optional)

An array of an executable and arguments to override the container's CMD To learn more about how to format this field and how it interacts with the command field, read the API reference for ContainerSpec.

gcloud CLI flag: --args

ports (optional)

An array of ports; AI Platform Prediction sends liveness checks, health checks, and prediction requests to your container on the first port listed, or 8080 by default. Specifying additional ports has no effect.

gcloud CLI flag: --ports

env (optional)

An array of environment variables that the container's entrypoint command, as well as the command and args fields, can reference. To learn more about how other fields can reference these environment variables, read the API reference for ContainerSpec.

gcloud CLI flag: --env-vars

In addition to the variables that you set in the Version.container.env field, AI Platform Prediction sets several other variables based on your configuration. Learn more about using these environment variables in these fields and in the container's entrypoint command.

The following example shows how to specify these fields when you create a model version using the Google Cloud CLI:

gcloud beta ai-platform versions create VERSION \
  --region=REGION \
  --model=MODEL \
  --machine-type=n1-standard-4 \
  --image=IMAGE_URI \
  --command=executable,param1,param2 \
  --args=param3,param4 \
  --ports=8081 \
  --env-vars \
    VAR1='value 1' \
    VAR2='value 2'

Replace the following:

Configure Version.routes

You may specify a RouteMap message in the Version.routes field. Within this message, you can specify the following subfields. If you use the gcloud beta ai-platform versions create command to create your model version, then you can use a command-line flag to specify each subfield.

health (optional)

The path on your container's HTTP server where you want AI Platform Prediction to send health checks.

If you don't specify this field, then it defaults to /v1/models/MODEL/versions/VERSION, where MODEL and VERSION are replaced by the names of your model and model version respectively.

gcloud CLI flag: --health-route

predict (optional)

The path on your container's HTTP server where you want AI Platform Prediction to forward prediction requests.

If you don't specify this field, then it defaults to /v1/models/MODEL/versions/VERSION:predict, where MODEL and VERSION are replaced by the names of your model and model version respectively.

gcloud CLI flag: --predict-route

The following example shows how to specify these fields when you create a model version using the gcloud CLI:

gcloud beta ai-platform versions create VERSION \
  --region=REGION \
  --model=MODEL \
  --machine-type=n1-standard-4 \
  --image=IMAGE_URI \
  --command=executable,param1,param2 \
  --args=param3,param4 \
  --ports=8081 \
  --env-vars \
    VAR1='value 1' \
    VAR2='value 2' \
  --health-route=/health \
  --predict-route=/predict

Replace the following:

Sending prediction requests

To send an online prediction request to your model version, follow the guide to online prediction: this process works the same regardless of whether you use a custom container.

However, when you use a custom container, the body of each prediction request does not need to meet the request body requirements for model versions that use a runtime version. That said, we recommend that you design your container to expect request bodies with the standard format if possible. Learn more about predict request and response requirements for custom containers.

What's next