To customize how Vertex AI serves online predictions from your
custom-trained model, you can specify a custom container instead of a prebuilt
container when you create a
Model
resource. When you use a custom container, Vertex AI runs a
Docker container of your choice on each prediction node.
You might want to use a custom container for any of the following reasons:
- to serve predictions from an ML model trained using a framework that isn't available as a prebuilt container
- to preprocess prediction requests or postprocess the predictions generated by your model
- to run a prediction server written in a programming language of your choice
- to install dependencies that you want to use to customize prediction
This guide describes how to create a model that uses a custom container. It doesn't provide detailed instructions about designing and creating a Docker container image.
Prepare a container image
To create a Model
that uses a custom container, you must provide a
Docker container image as the basis of that container. This container image must
meet the requirements described in Custom container
requirements.
If you plan to use an existing container image created by a third party that you trust, you might be able to skip one or both of the following sections.
Create a container image
Design and build a Docker container image that meets the container image requirements.
To learn the basics of designing and building a Docker container image, read the Docker documentation's quickstart
Push the container image to Artifact Registry
Push your container image to an Artifact Registry repository.
Learn how to push a container image to Artifact Registry.
Create a Model
To create a Model
that uses a custom container, do one of the following:
The follow sections show how to configure the API fields related to custom
containers when you create a Model
in one of these ways.
Container-related API fields
When you create the Model
, make sure to configure the containerSpec
field with
your custom container details, rather than with a prebuilt
container.
You must specify a ModelContainerSpec
message in
the Model.containerSpec
field. Within this message, you can specify the
following subfields:
imageUri
(required)The Artifact Registry URI of your container image.
If you are using the
gcloud ai models upload
command, you can use the--container-image-uri
flag to specify this field.command
(optional)An array of an executable and arguments to override the container's
ENTRYPOINT
instruction. To learn more about how to format this field and how it interacts with theargs
field, read the API reference forModelContainerSpec
.If you are using the
gcloud ai models upload
command, you can use the--container-command
flag to specify this field.args
(optional)An array of an executable and arguments to override the container's
CMD
To learn more about how to format this field and how it interacts with thecommand
field, read the API reference forModelContainerSpec
.If you are using the
gcloud ai models upload
command, you can use the--container-args
flag to specify this field.ports
(optional)An array of ports; Vertex AI sends liveness checks, health checks, and prediction requests to your container on the first port listed, or
8080
by default. Specifying additional ports has no effect.If you are using the
gcloud ai models upload
command, you can use the--container-ports
flag to specify this field.env
(optional)An array of environment variables that the container's
ENTRYPOINT
instruction, as well as thecommand
andargs
fields, can reference. To learn more about how other fields can reference these environment variables, read the API reference forModelContainerSpec
.If you are using the
gcloud ai models upload
command, you can use the--container-env-vars
flag to specify this field.healthRoute
(optional)The path on your container's HTTP server where you want Vertex AI to send health checks.
If you don't specify this field, then when you deploy the
Model
as aDeployedModel
to anEndpoint
resource it defaults to/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL
, where ENDPOINT is replaced by the last segment of theEndpoint
'sname
field (followingendpoints/
) and DEPLOYED_MODEL is replaced by theDeployedModel
'sid
field.If you are using the
gcloud ai models upload
command, you can use the--container-health-route
flag to specify this field.predictRoute
(optional)The path on your container's HTTP server where you want Vertex AI to forward prediction requests.
If you don't specify this field, then when you deploy the
Model
as aDeployedModel
to anEndpoint
resource it defaults to/v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL:predict
, where ENDPOINT is replaced by the last segment of theEndpoint
'sname
field (followingendpoints/
) and DEPLOYED_MODEL is replaced by theDeployedModel
'sid
field.If you are using the
gcloud ai models upload
command, you can use the--container-predict-route
flag to specify this field.sharedMemorySizeMb
(optional)The amount of VM memory to reserve in a shared memory volume for the model in megabytes.
Shared memory is an Inter-process communication (IPC) mechanism that allows multiple processes to access and manipulate a common block of memory. The amount of shared memory needed, if any, is an implementation detail of your container and model. Consult your model server documentation for guidelines.
If you are using the
gcloud ai models upload
command, you can use the--container-shared-memory-size-mb
flag to specify this field.startupProbe
(optional)Specification for the probe that checks whether the container application has started.
If you are using the
gcloud ai models upload
command, you can use the--container-startup-probe-exec, --container-startup-probe-period-seconds, --container-startup-probe-timeout-seconds
flag to specify this field.healthProbe
(optional)Specification for the probe that checks whether a container is ready to accept traffic.
If you are using the
gcloud ai models upload
command, you can use the--container-health-probe-exec, --container-health-probe-period-seconds, --container-health-probe-timeout-seconds
flag to specify this field.
In addition to the variables that you set in the Model.containerSpec.env
field, Vertex AI sets several other variables based on your
configuration. Learn more about
using these environment variables in these fields and in the container's ENTRYPOINT
instruction.
Model import examples
The following examples show how to specify container-related API fields when you import a model.
gcloud
The following example uses the gcloud ai models upload
command:
gcloud ai models upload \
--region=LOCATION \
--display-name=MODEL_NAME \
--container-image-uri=IMAGE_URI \
--container-command=COMMAND \
--container-args=ARGS \
--container-ports=PORTS \
--container-env-vars=ENV \
--container-health-route=HEALTH_ROUTE \
--container-predict-route=PREDICT_ROUTE \
--container-shared-memory-size-mb=SHARED_MEMORY_SIZE \
--container-startup-probe-exec=STARTUP_PROBE_EXEC \
--container-startup-probe-period-seconds=STARTUP_PROBE_PERIOD \
--container-startup-probe-timeout-seconds=STARTUP_PROBE_TIMEOUT \
--container-health-probe-exec=HEALTH_PROBE_EXEC \
--container-health-probe-period-seconds=HEALTH_PROBE_PERIOD \
--container-health-probe-timeout-seconds=HEALTH_PROBE_TIMEOUT \
--artifact-uri=PATH_TO_MODEL_ARTIFACT_DIRECTORY
The --container-image-uri
flag is required; all other flags that begin
with --container-
are optional. To learn about the values for these fields,
see the preceding section of this guide.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
For more context, read the Model import guide.
Send prediction requests
To send an online prediction request to your Model
, follow the instructions at
Get predictions from a custom trained model:
this process works the same regardless of whether you use a custom container.
Read about predict request and response requirements for custom containers.
What's next
- To learn about everything to consider when you design a custom container to use with Vertex AI, read Custom container requirements.