Custom container requirements

To use a custom container to serve predictions, you must provide AI Platform Prediction with a Docker container image that runs an HTTP server. This document describes the requirements that a container image must meet to be compatible with AI Platform Prediction. The document also describes how AI Platform Prediction interacts with your custom container once it starts running. In other words, this document describes what you need to consider when designing a container image to use with AI Platform Prediction.

To walk through using a custom container image to serve predictions, read Using a custom container.

Container image requirements

When your Docker container image runs as a container, the container must run an HTTP server. Specifically, the container must listen and respond to liveness checks, health checks, and prediction requests. The following subsections describe these requirements in detail.

You can implement the HTTP server in any way, using any programming language, as long as it meets the requirements in this section. For example, you can write a custom HTTP server using a web framework like Flask or use machine learning (ML) serving software that runs an HTTP server, like TensorFlow Serving, TorchServe, or KFServing Server.

Running the HTTP server

You can run an HTTP server by using an ENTRYPOINT instruction, a CMD instruction, or both in the Dockerfile that you use to build your container image. Read about the interaction between CMD and ENTRYPOINT.

Alternatively, you can specify the containerSpec.command and containerSpec.args fields when you create your model version in order to override your container image's ENTRYPOINT and CMD respectively. Specifying one of these fields lets you use a container image that would otherwise not meet the requirements due to an incompatible (or nonexistent) ENTRYPOINT or CMD.

However you determine which command your container runs when it starts, ensure that this entrypoint command runs indefinitely. For example, don't run a command that starts an HTTP server in the background and then exits; if you do, then the container will exit immediately after it starts running.

Your HTTP server must listen for requests on 0.0.0.0, on a port of your choice. When you create a model version, specify this port in the containerSpec.ports field. To learn how the container can access this value, read the section of this document about the AIP_HTTP_PORT environment variable.

Liveness checks

AI Platform Prediction performs a liveness check when your container starts to ensure that your server is running. During the version creation process, AI Platform Prediction uses a TCP liveness probe to attempt to establish a TCP connection to your container on the configured port. The probe makes up to 4 attempts to establish a connection, waiting 10 seconds after each failure. If the probe still hasn't established a connection at this point, AI Platform Prediction restarts your container.

Your HTTP server doesn't need to perform any special behavior to handle these checks. As long as it is listening for requests on the configured port, the liveness probe is able to make a connection.

Health checks

AI Platform Prediction intermittently performs health checks on your HTTP server while it is running to ensure that it is ready to handle prediction requests. The service uses a health probe to send HTTP GET requests to a configurable health check path on your server. Specify this path in the routes.health field when you create a model version. To learn how the container can access this value, read the section of this document about the AIP_HEALTH_ROUTE environment variable.

Configure the HTTP server to respond to each health check request as follows:

  • If the server is ready to handle prediction requests, respond to the health check request with status code 200 OK. The contents of the response body do not matter; AI Platform Prediction ignores them.

    This response signifies that the server is healthy.

  • If the server is not ready to handle prediction requests, do not respond to the health check request, or respond with any status code except for 200 OK. For example, respond with status code 503 Service Unavailable.

    This response signifies that the server is unhealthy.

If the health probe ever receives an unhealthy response from your server, then it sends up to 3 additional health checks at 10 second intervals. During this period, AI Platform Prediction still considers your server healthy. If the probe receives a healthy response to any of these checks, then the probe immediately returns to its intermittent schedule of health checks. However, if the probe receives 4 consecutive unhealthy responses, then AI Platform Prediction stops routing prediction traffic to the container. (If the model version is scaled to use multiple prediction nodes, then AI Platform Prediction routes prediction requests to other, healthy containers.)

AI Platform Prediction does not restart the container; instead the health probe continues sending intermittent health check requests to the unhealthy server. If it receives a healthy response, then it marks that container as healthy and starts to route prediction traffic to it again.

Practical guidance

In some cases, it is sufficient for the HTTP server in your container to always respond with status code 200 OK to health checks. If your container loads resources before starting the server, then the container is unhealthy during the startup period and during any periods when the HTTP server fails. At all other times, it responds as healthy.

For a more sophisticated configuration, you might want to purposefully design the HTTP server to respond to health checks with an unhealthy status at certain times. For example, you might want to block prediction traffic to a node for a period so that the container can perform maintenance.

Prediction requests

When a client sends a projects.predict request to the AI Platform Training and Prediction API, AI Platform Prediction forwards this request as an HTTP POST request to a configurable prediction path on your server. Specify this path in the routes.predict field when you create a model version. To learn how the container can access this value, read the section of this document about the AIP_PREDICT_ROUTE environment variable.

AI Platform Prediction does not validate prediction requests and responses; it passes each prediction request unchanged to the HTTP server in your container, and it passes the server's responses back to the client.

Each prediction request and response must be 1.5 MB or smaller. However, you are not required to follow the other requirements for request bodies and requirements for response bodies; these requirements only apply to model versions that don't use a custom container. When you use a custom container, your request and response bodies can take any form.

However, we still recommend that you design your HTTP server to conform to the request and response requirements described in the previous links. If you don't, there's no guarantee that other AI Platform Prediction features—like logging, monitoring, and AI Explanations—will work properly.

Container image publishing requirements

You must push your container image to Artifact Registry in order to use it with AI Platform Prediction. Learn how to push a container image to Artifact Registry.

In particular, you must push the container image to a repository that meets the following location and permissions requirements.

Location

The repository must use a region that matches the regional endpoint where you plan to create a model version. For example, if you plan to create a model version on the us-central1-ml.googleapis.com endpoint, then the full name of your container image must start with us-central1-docker.pkg.dev/.

Do not use a multi-regional repository for your container image.

Permissions

AI Platform Prediction must have permission to pull the container image when you create a model version. Specifically, the AI Platform service agent must have the permissions of the Artifact Registry Reader role (roles/artifactregistry.reader) for the container image's repository.

If you have pushed your container image to the same Google Cloud project where you are using AI Platform Prediction, then you do not have to configure any permissions. The default permissions granted to the service agent are sufficient.

On the other hand, if you have pushed your container image to a different Google Cloud project from the one where you are using AI Platform Prediction, then you must grant the Artifact Registry Reader role for the Artifact Registry repository to the AI Platform service agent.

Accessing model artifacts

When you create a model version without a custom container, you must specify the URI of a Cloud Storage directory with model artifacts as the deploymentUri field. When you create a model version with a custom container, providing model artifacts in Cloud Storage is optional.

If the container image includes the model artifacts that you need to serve predictions, then there is no need to load files from Cloud Storage. However, if you do provide model artifacts by specifying the deploymentUri field, then the container must load these artifacts when it starts running. When AI Platform Prediction starts your container, it sets the AIP_STORAGE_URI environment variable to a Cloud Storage URI that begins with gs://. Your container's entrypoint command can download the directory specified by this URI in order to access the model artifacts.

Note that the value of the AIP_STORAGE_URI environment variable is not identical to the Cloud Storage URI that you specify in the deploymentUri field when you create the model version. Rather, AIP_STORAGE_URI points to a copy of your model artifact directory in a different Cloud Storage bucket, which AI Platform Prediction manages. AI Platform Prediction populates this directory when you create a model version. You cannot update the contents of the directory. If you want to use new model artifacts, then you must create a new model version.

The service account that your container uses by default has permission to read from this URI. On the other hand, if you specify a custom service account when you create a model version, AI Platform Prediction automatically grants your specified service account the Storage Object Viewer (roles/storage.objectViewer) role for the URI's Cloud Storage bucket.

Use any library that supports Application Default Credentials (ADC) to load the model artifacts; you don't need to explicitly configure authentication.

Since the container supports ADC for the AI Platform service agent—or a custom service account, if you have specified one—it can also access any other Google Cloud services that its service account has permissions for.

Environment variables available in the container

When running, the container's entrypoint command can reference environment variables that you have configured manually, as well as environment variables set automatically by AI Platform Prediction. This section describes each way that you can set environment variables, and it provides details about the variables set automatically by AI Platform Prediction.

Variables set in the container image

To set environment variables in the container image when you build it, use Docker's ENV instruction. Do not set any environment variables that begin with the prefix AIP_.

The container's entrypoint command can use these environment variables, but you cannot reference them in any of your model version's API fields.

Variables set by AI Platform Prediction

When AI Platform Prediction starts running the container, it sets the following environment variables in the container environment. Each variable begins with the prefix AIP_. Do not manually set any environment variables that use this prefix.

The container's entrypoint command can access these variables. To learn which AI Platform Training and Prediction API fields can also reference these variables, read the API reference for ContainerSpec.

Variable name Default value How to configure value Details
AIP_ACCELERATOR_TYPE Unset When you create a model version, set the acceleratorConfig.type field. If applicable, this variable specifies the type of accelerator used by the virtual machine (VM) instance that the container is running on.
AIP_FRAMEWORK CUSTOM_CONTAINER Not configurable
AIP_HEALTH_ROUTE /v1/models/MODEL/versions/VERSION

In this string, replace MODEL with the value of the AIP_MODEL_NAME variable and replace VERSION with the value of the AIP_VERSION_NAME variable.
When you create a model version, set the routes.health field. This variables specifies the HTTP path on the container that AI Platform Prediction sends health checks to.
AIP_HTTP_PORT 8080 When you create a model version, set the containerSpec.ports field. The first entry in this field becomes the value of AIP_HTTP_PORT. AI Platform Prediction sends liveness checks, health checks, and prediction requests to this port on the container. Your container's HTTP server must listen for requests on this port.
AIP_MACHINE_TYPE No default, must be configured When you create a model version, set the machineType field. This variable specifies the type of VM that the container is running on.
AIP_MODE PREDICTION Not configurable This variable signifies that the container is running on AI Platform Prediction to serve online predictions. You can use this environment variable to add custom logic to your container, so that it can run in multiple computing environments but only use certain code paths on when run on AI Platform Prediction.
AIP_MODE_VERSION 1.0.0 Not configurable This variable signifies the version of the custom container requirements (this document) that AI Platform Prediction expects the container to meet. This document updates according to semantic versioning.
AIP_MODEL_NAME No default, must be configured When you create a model (the parent of the model version that uses the container), specify the name field. The value does not include the projects/PROJECT_ID/models/ prefix that the AI Platform Training and Prediction API produces in output.
AIP_PREDICT_ROUTE /v1/models/MODEL/versions/VERSION:predict

In this string, replace MODEL with the value of the AIP_MODEL_NAME variable and replace VERSION with the value of the AIP_VERSION_NAME variable.
When you create a model version, set the routes.predict field. This variable specifies the HTTP path on the container that AI Platform Prediction forwards prediction requests to.
AIP_PROJECT_NUMBER The project number of the Google Cloud project where you are using AI Platform Prediction Not configurable
AIP_STORAGE_URI
  • If you don't set the deploymentUri field when you create a model version: an empty string
  • If you do set the deploymentUri field when you create a model version: a Cloud Storage URI (starting with gs://) specifying a directory in a bucket managed by AI Platform Prediction
Not configurable This variable specifies the directory that contains a copy of your model artifacts, if applicable.
AIP_VERSION_NAME No default, must be configured When you create a model version, set the name field. The value does not include the projects/PROJECT_ID/models/MODEL/versions/ prefix that the AI Platform Training and Prediction API produces in output.

Variables set in the Version resource

When you create a model version, you can set additional environment variables in the container.env field.

The container's entrypoint command can access these variables. To learn which AI Platform Training and Prediction API fields can also reference these variables, read the API reference for ContainerSpec.

What's next