To customize how AI Platform Prediction serves online predictions from your trained machine learning (ML) model, you can specify a custom container instead of a runtime version when you create a model version. When you use a custom container, AI Platform Prediction runs a Docker container of your choice on each prediction node instead of running the standard runtime version code to serve predictions from compatible model artifacts.
You might want to use a custom container for any of the following reasons:
- to serve predictions from an ML model trained using a framework other than TensorFlow, scikit-learn, or XGBoost
- to preprocess prediction requests or postprocess the predictions generated by your model
- to run a prediction server written in a programming language of your choice
- to install dependencies that you want to use to customize prediction
This guide describes how to create a model version that uses a custom container. It does not provide detailed instructions about designing and creating a Docker container image. To walk through an example of creating a container image and using it with AI Platform Prediction, read Getting started: Serving PyTorch predictions with a custom container.
To use a custom container, you must use a regional endpoint and a Compute Engine (N1) machine type for your model version.
Preparing a container image
To create a model version that uses a custom container, you must provide a Docker container image as the basis of that container. This container image must meet the requirements described in Custom container requirements.
If you plan to use an existing container image created by a third party that you trust, then you might be able to skip one or both of the following sections.
Create a container image
Design and build a Docker container image that meets the container image requirements.
To learn the basics of designing and building a Docker container image, read the Docker documentation's quickstart
Push the container image to Artifact Registry
Push your container image to an Artifact Registry repository that meets the container image publishing requirements.
Learn how to push a container image to Artifact Registry.
Creating a model and model version
Specify several configuration options when you create a model to ensure that any model versions that you later create on the model are compatible with your custom container.
Then, specify most of the container configuration when you create a model version.
Create a model
To create a model, follow the instructions for creating a model resource. You must create the model on a regional endpoint that matches the region of the Artifact Registry repository where your container image is stored. To learn more, read the container image publishing requirements.
Create a model version
When you create a model version that uses a custom container, configure the following container-specific API fields in addition to the other fields that you specify for a model version:
Version.container
Version.routes
(optional)
The following sections describe how to configure these fields.
In addition, note the following container-specific differences in how you configure other API fields:
Version.machineType
: You must set this field to a Compute Engine (N1) machine type.Version.deploymentUri
: This field becomes optional. Learn how your container can access artifacts specified by this field.Version.runtimeVersion
,Version.framework
,Version.pythonVersion
,Version.predictionClass
, andVersion.packageUris
: You must not specify these fields.
Configure Version.container
You must specify a
ContainerSpec
message in
the Version.container
field. Within this message, you can specify the
following subfields. If you use the gcloud beta ai-platform versions create
command to create your
model version, then you can use a command-line flag to specify each subfield.
image
(required)The Artifact Registry URI of your container image.
gcloud CLI flag:
--image
command
(optional)An array of an executable and arguments to override the container's
ENTRYPOINT
. To learn more about how to format this field and how it interacts with theargs
field, read the API reference forContainerSpec
.gcloud CLI flag:
--command
args
(optional)An array of an executable and arguments to override the container's
CMD
To learn more about how to format this field and how it interacts with thecommand
field, read the API reference forContainerSpec
.gcloud CLI flag:
--args
ports
(optional)An array of ports; AI Platform Prediction sends liveness checks, health checks, and prediction requests to your container on the first port listed, or
8080
by default. Specifying additional ports has no effect.gcloud CLI flag:
--ports
env
(optional)An array of environment variables that the container's entrypoint command, as well as the
command
andargs
fields, can reference. To learn more about how other fields can reference these environment variables, read the API reference forContainerSpec
.gcloud CLI flag:
--env-vars
In addition to the variables that you set in the Version.container.env
field,
AI Platform Prediction sets several other variables based on your configuration.
Learn more about using these environment variables in these fields and in the
container's entrypoint
command.
The following example shows how to specify these fields when you create a model version using the Google Cloud CLI:
gcloud beta ai-platform versions create VERSION \
--region=REGION \
--model=MODEL \
--machine-type=n1-standard-4 \
--image=IMAGE_URI \
--command=executable,param1,param2 \
--args=param3,param4 \
--ports=8081 \
--env-vars \
VAR1='value 1' \
VAR2='value 2'
Replace the following:
- VERSION: the name of your model version
- REGION: the region of the AI Platform Prediction endpoint where you have created your model
- MODEL: the name of your model
- IMAGE_URI: the URI of your container image in Artifact Registry, which must begin with REGION (as described in the container image publishing requirements)
Configure Version.routes
You may specify a RouteMap
message in the Version.routes
field. Within this message, you can specify the following subfields. If you use
the gcloud beta ai-platform versions create
command to create your
model version, then you can use a command-line flag to specify each subfield.
health
(optional)The path on your container's HTTP server where you want AI Platform Prediction to send health checks.
If you don't specify this field, then it defaults to
/v1/models/MODEL/versions/VERSION
, where MODEL and VERSION are replaced by the names of your model and model version respectively.gcloud CLI flag:
--health-route
predict
(optional)The path on your container's HTTP server where you want AI Platform Prediction to forward prediction requests.
If you don't specify this field, then it defaults to
/v1/models/MODEL/versions/VERSION:predict
, where MODEL and VERSION are replaced by the names of your model and model version respectively.gcloud CLI flag:
--predict-route
The following example shows how to specify these fields when you create a model version using the gcloud CLI:
gcloud beta ai-platform versions create VERSION \
--region=REGION \
--model=MODEL \
--machine-type=n1-standard-4 \
--image=IMAGE_URI \
--command=executable,param1,param2 \
--args=param3,param4 \
--ports=8081 \
--env-vars \
VAR1='value 1' \
VAR2='value 2' \
--health-route=/health \
--predict-route=/predict
Replace the following:
- VERSION: the name of your model version
- REGION: the region of the AI Platform Prediction endpoint where you have created your model
- MODEL: the name of your model
- IMAGE_URI: the URI of your container image in Artifact Registry, which must begin with REGION (as described in the container image publishing requirements)
Sending prediction requests
To send an online prediction request to your model version, follow the guide to online prediction: this process works the same regardless of whether you use a custom container.
However, when you use a custom container, the body of each prediction request does not need to meet the request body requirements for model versions that use a runtime version. That said, we recommend that you design your container to expect request bodies with the standard format if possible. Learn more about predict request and response requirements for custom containers.
What's next
To walk through an example of creating a container image and using it with AI Platform Prediction, read Getting started: Serving PyTorch predictions with a custom container.
To learn about everything to consider when you design a custom container to use with AI Platform Prediction, read Custom container requirements.
To learn how to alter a container's permissions to access other Google Cloud services, read Using a custom service account.