This tutorial shows you how to use a custom container to deploy a PyTorch machine learning (ML) model that serves online predictions.
In this tutorial, you deploy a container running PyTorch's TorchServe tool in order to serve predictions from a digit recognition model provided by TorchServe that has been pre-trained on the MNIST dataset. You can then use AI Platform Prediction to classify images of digits.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Artifact Registry API APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Artifact Registry API APIs.
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
Throughout this tutorial, we recommend that you use Cloud Shell to interact with Google Cloud. If you want to use a different Bash shell instead of Cloud Shell, then perform the following additional configuration:
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- Follow the Artifact Registry documentation to Install Docker.
Building and pushing the container image
To use a custom container, you must specify a Docker container image that meets the custom container requirements. This section describes how to create the container image and push it to Artifact Registry.
Download model artifacts
Model artifacts are files created by ML training that you can use to serve predictions. They contain, at a minimum, the structure and weights of your trained ML model. The format of model artifacts depends on what ML framework you use for training .
For this tutorial, instead of training from scratch, download example model artifacts that are provided by TorchServe.
To clone the TorchServe repository and navigate to the directory with the model artifacts, run the following commands in your shell:
git clone https://github.com/pytorch/serve.git \
--branch=v0.3.0 \
--depth=1
cd serve/examples/image_classifier/mnist
This directory contains three important files to build into your container image:
mnist.py
: defines the structure of the trained neural networkmnist_cnn.pt
: contains astate_dict
with feature weights and other outputs from trainingmnist_handler.py
: extends how TorchServe handles prediction requests
Create an Artifact Registry repository
Create an Artifact Registry repository to store the container image that you will create in the next section. Run the following command in your shell:
gcloud beta artifacts repositories create getting-started-pytorch \
--repository-format=docker \
--location=REGION
Replace REGION with the region where you want to Artifact Registry to
store your container image. Later, you must create a AI Platform Prediction
model resource on a regional endpoint matching this region, so choose a region
where AI Platform Prediction has a regional
endpoint; for example, us-central1
.
After completing the operation, this command prints the following input:
Created repository [getting-started-pytorch].
Build the container image
TorchServe provides a Dockerfile for building a container image that runs TorchServe. However, instead of using this Dockerfile to install all TorchServe's dependencies, you can speed up the build process by deriving your container image from one of the TorchServe images that the TorchServe team has pushed to Docker Hub.
In the directory with the model artifacts, create a new Dockerfile by running the following command in your shell:
cat > Dockerfile <<END FROM pytorch/torchserve:0.3.0-cpu COPY mnist.py mnist_cnn.pt mnist_handler.py /home/model-server/ USER root RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties USER model-server RUN torch-model-archiver \ --model-name=mnist \ --version=1.0 \ --model-file=/home/model-server/mnist.py \ --serialized-file=/home/model-server/mnist_cnn.pt \ --handler=/home/model-server/mnist_handler.py \ --export-path=/home/model-server/model-store CMD ["torchserve", \ "--start", \ "--ts-config=/home/model-server/config.properties", \ "--models", \ "mnist=mnist.mar"] END
These Docker instructions do the following actions:
The
FROM
instruction derives the current container image from an existing TorchServe image.The
COPY
instruction copies the model artifacts and the prediction handler from your local directory into the/home/model-server/
directory of the container image.The first
RUN
instruction edits the configuration file from the parent image to support AI Platform Prediction's preferred input format for predictions.Specifically, this instruction configures TorchServe to expect a JSON service envelope for prediction requests.
Editing this configuration file requires permission that the
model-server
user (which was created in the parent image) does not have. The instructions tell Docker to run as theroot
user to edit the configuration file and then continue to use themodel-server
user for following instructions.The second
RUN
instruction uses the Torch model archiver, which is already installed on the container image, to create a model archive from the files that you copied into the image. It saves this model archive in the/home/model-server/model-store/
directory with the filenamemnist.mar
.(If you want to alter the container image—for example, to perform custom preprocessing or postprocessing in the request handler—you can use additional
RUN
instructions to install dependencies.)The
CMD
instruction starts the TorchServe HTTP server. It references the configuration file from the parent image and enables serving for one model namedmnist
. This model loads themnist.mar
file created by theRUN
instruction.This instruction overrides the parent image's
CMD
instruction. It's important to override theCMD
instruction and not theENTRYPOINT
instruction, because the parent image'sENTRYPOINT
script runs the command passed inCMD
and also adds extra logic to prevent Docker from exiting.
To build the container image based on your new Dockerfile and tag it with a name compatible with your Artifact Registry repository, run the following command in your shell:
docker build \ --tag=REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist \ .
Replace the following:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project
The command might run for several minutes.
Run the container locally (optional)
Before you push your container image to Artifact Registry in order to use it with AI Platform Prediction, you can run it as a container in your local environment to verify that the server works as expected:
To run the container image as a container locally, run the following command in your shell:
docker run -d -p 8080:8080 --name=local_mnist \ REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist
Replace the following, as you did in the previous section:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project
This command runs a container in detached mode, mapping port
8080
of the container to port8080
of the local environment. (The parent image, from which you derived your container image, configures TorchServe to use port8080
.)To send the container's server a health check, run the following command in your shell:
curl localhost:8080/ping
If successful, the server returns the following response:
{ "status": "Healthy" }
To send the container's server a prediction request, run the following commands in your shell:
cat > instances.json <<END { "instances": [ { "data": { "b64": "$(base64 --wrap=0 test_data/3.png)" } } ] } END curl -X POST \ -H "Content-Type: application/json; charset=utf-8" \ -d @instances.json \ localhost:8080/predictions/mnist
This request uses one of the test images included with the TorchServe example.
If successful, the server returns the following prediction:
{"predictions": [3]}
To stop the container, run the following command in your shell:
docker stop local_mnist
Push the container image to Artifact Registry
Configure Docker to access Artifact Registry. Then push your container image to your Artifact Registry repository.
To give your local Docker installation permission to push to Artifact Registry in your chosen region, run the following command in your shell:
gcloud auth configure-docker REGION-docker.pkg.dev
Replace REGION with the region where you created your repository in a previous section.
To push the container image that you just built to Artifact Registry, run the following command in your shell:
docker push REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist
Replace the following, as you did in the previous section:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project
Deploying the container
This section walks through creating a model and model version on AI Platform Prediction in order to serve prediction. The model version runs your container image as a container in order to serve predictions.
This tutorial provides specific configuration options to use when you create your model and model version. If you want to learn about different configuration options, read Deploying models.
Create a model
To create a model resource, run the following command in your shell:
gcloud beta ai-platform models create getting_started_pytorch \
--region=REGION \
--enable-logging \
--enable-console-logging
Replace REGION with the same region where your you created your Artifact Registry repository in a previous section.
Create a model version
To create a model version resource, run the following command in your shell:
gcloud beta ai-platform versions create v1 \
--region=REGION \
--model=getting_started_pytorch \
--machine-type=n1-standard-4 \
--image=REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist \
--ports=8080 \
--health-route=/ping \
--predict-route=/predictions/mnist
Replace the following:
- REGION: the region where you created your Artifact Registry repository and AI Platform Prediction model in previous sections
- PROJECT_ID: the ID of your Google Cloud project
The container-related flags in this command do the following:
--image
: The URI of your container image.--ports
: The port that your container's HTTP server listens for requests on. The parent image, from which you derived your container image, configures TorchServe to use port8080
.--health-route
: The path on which your container's HTTP server listens for health checks. TorchServe always listens for health checks on the/ping
path.--predict-route
: The path on which your container's HTTP server listens for prediction requests. TorchServe always listens for prediction requests on the/predictions/MODEL
path.MODEL is the name of the model that you specified when you started TorchServe. In this case, the name is
mnist
, which you set in this Docker instruction from a previous section:CMD ["torchserve", \ "--start", \ "--ts-config=/home/model-server/config.properties", \ "--models", \ "mnist=mnist.mar"]
Getting a prediction
The TorchServe example files that you downloaded in a previous section include
test
images. The container's TorchServe configuration expects to receive
prediction requests in JSON format, with the image as a base64-encoded string
in the data.b64
field of each instance.
For example, to classify test_data/3.png
, run the following commands in your
shell:
cat > instances.json <<END
{
"instances": [
{
"data": {
"b64": "$(base64 --wrap=0 test_data/3.png)"
}
}
]
}
END
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @instances.json \
https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models/getting_started_pytorch/versions/v1:predict
Replace the following:
- REGION: the region where you created your AI Platform Prediction model a previous section
- PROJECT_ID: the ID of your Google Cloud project
If successful, the model version returns the following prediction:
{"predictions": [3]}
Cleaning up
To avoid incurring further AI Platform Prediction charges and Artifact Registry charges, delete the Google Cloud resources that you created during this tutorial:
To delete your model version, run the following command in your shell:
gcloud ai-platform versions delete v1 \ --region=REGION \ --model=getting_started_pytorch \ --quiet
Replace REGION with the region where you created your model in a previous section.
To delete your model, run the following command in your shell:
gcloud ai-platform models delete getting_started_pytorch \ --region=REGION \ --quiet
Replace REGION with the region where you created your model in a previous section.
To delete your Artifact Registry repository and the container image in it, run the following command in your shell:
gcloud beta artifacts repositories delete getting-started-pytorch \ --location=REGION \ --quiet
Replace REGION with the region where you created your Artifact Registry repository in a previous section.
What's next
If you want to design your own container image—either from scratch, or by deriving from an existing third-party container image—read Custom container requirements.
Learn more about using a custom container for prediction, including compatibility with other AI Platform Prediction features and configuration options that you can specify for your container during deployment.