This legacy version of AI Platform Prediction is deprecated and will no longer be available on Google Cloud after January 31, 2025. All models, associated metadata, and deployments will be deleted after January 31, 2025. Migrate your resources to Vertex AI to get new machine learning features that are unavailable in AI Platform.

Getting started: Serving PyTorch predictions with a custom container

This tutorial shows you how to use a custom container to deploy a PyTorch machine learning (ML) model that serves online predictions.

In this tutorial, you deploy a container running PyTorch's TorchServe tool in order to serve predictions from a digit recognition model provided by TorchServe that has been pre-trained on the MNIST dataset. You can then use AI Platform Prediction to classify images of digits.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the AI Platform Training & Prediction and Artifact Registry API APIs.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the AI Platform Training & Prediction and Artifact Registry API APIs.

Enable the APIs

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Throughout this tutorial, we recommend that you use Cloud Shell to interact with Google Cloud. If you want to use a different Bash shell instead of Cloud Shell, then perform the following additional configuration:

Install the Google Cloud CLI.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```
Follow the Artifact Registry documentation to Install Docker.

Building and pushing the container image

To use a custom container, you must specify a Docker container image that meets the custom container requirements. This section describes how to create the container image and push it to Artifact Registry.

Download model artifacts

Model artifacts are files created by ML training that you can use to serve predictions. They contain, at a minimum, the structure and weights of your trained ML model. The format of model artifacts depends on what ML framework you use for training .

For this tutorial, instead of training from scratch, download example model artifacts that are provided by TorchServe.

To clone the TorchServe repository and navigate to the directory with the model artifacts, run the following commands in your shell:

git clone https://github.com/pytorch/serve.git \
  --branch=v0.3.0 \
  --depth=1

cd serve/examples/image_classifier/mnist

This directory contains three important files to build into your container image:

mnist.py: defines the structure of the trained neural network
mnist_cnn.pt: contains a state_dict with feature weights and other outputs from training
mnist_handler.py: extends how TorchServe handles prediction requests

Create an Artifact Registry repository

Create an Artifact Registry repository to store the container image that you will create in the next section. Run the following command in your shell:

gcloud beta artifacts repositories create getting-started-pytorch \
 --repository-format=docker \
 --location=REGION

Replace REGION with the region where you want to Artifact Registry to store your container image. Later, you must create a AI Platform Prediction model resource on a regional endpoint matching this region, so choose a region where AI Platform Prediction has a regional endpoint; for example, us-central1.

After completing the operation, this command prints the following input:

Created repository [getting-started-pytorch].

Build the container image

TorchServe provides a Dockerfile for building a container image that runs TorchServe. However, instead of using this Dockerfile to install all TorchServe's dependencies, you can speed up the build process by deriving your container image from one of the TorchServe images that the TorchServe team has pushed to Docker Hub.

In the directory with the model artifacts, create a new Dockerfile by running the following command in your shell:
```
cat > Dockerfile <<END
FROM pytorch/torchserve:0.3.0-cpu

COPY mnist.py mnist_cnn.pt mnist_handler.py /home/model-server/

USER root
RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
USER model-server

RUN torch-model-archiver \
  --model-name=mnist \
  --version=1.0 \
  --model-file=/home/model-server/mnist.py \
  --serialized-file=/home/model-server/mnist_cnn.pt \
  --handler=/home/model-server/mnist_handler.py \
  --export-path=/home/model-server/model-store

CMD ["torchserve", \
     "--start", \
     "--ts-config=/home/model-server/config.properties", \
     "--models", \
     "mnist=mnist.mar"]
END
```
These Docker instructions do the following actions:
- The FROM instruction derives the current container image from an existing TorchServe image.
- The COPY instruction copies the model artifacts and the prediction handler from your local directory into the /home/model-server/ directory of the container image.
- The first RUN instruction edits the configuration file from the parent image to support AI Platform Prediction's preferred input format for predictions.
  
  Specifically, this instruction configures TorchServe to expect a JSON service envelope for prediction requests.
  
  Editing this configuration file requires permission that the model-server user (which was created in the parent image) does not have. The instructions tell Docker to run as the root user to edit the configuration file and then continue to use the model-server user for following instructions.
- The second RUN instruction uses the Torch model archiver, which is already installed on the container image, to create a model archive from the files that you copied into the image. It saves this model archive in the /home/model-server/model-store/ directory with the filename mnist.mar.
  
  (If you want to alter the container image—for example, to perform custom preprocessing or postprocessing in the request handler—you can use additional RUN instructions to install dependencies.)
- The CMD instruction starts the TorchServe HTTP server. It references the configuration file from the parent image and enables serving for one model named mnist. This model loads the mnist.mar file created by the RUN instruction.
  
  This instruction overrides the parent image's CMD instruction. It's important to override the CMD instruction and not the ENTRYPOINT instruction, because the parent image's ENTRYPOINT script runs the command passed in CMD and also adds extra logic to prevent Docker from exiting.
To build the container image based on your new Dockerfile and tag it with a name compatible with your Artifact Registry repository, run the following command in your shell:
```
docker build \
  --tag=REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist \
  .
```
Replace the following:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project
The command might run for several minutes.

Run the container locally (optional)

Before you push your container image to Artifact Registry in order to use it with AI Platform Prediction, you can run it as a container in your local environment to verify that the server works as expected:

To run the container image as a container locally, run the following command in your shell:
```
docker run -d -p 8080:8080 --name=local_mnist \
  REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist
```
Replace the following, as you did in the previous section:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project
This command runs a container in detached mode, mapping port 8080 of the container to port 8080 of the local environment. (The parent image, from which you derived your container image, configures TorchServe to use port 8080.)
To send the container's server a health check, run the following command in your shell:
```
curl localhost:8080/ping
```
If successful, the server returns the following response:
```
{
  "status": "Healthy"
}
```

To send the container's server a prediction request, run the following commands in your shell:

cat > instances.json <<END
{
  "instances": [
    {
      "data": {
        "b64": "$(base64 --wrap=0 test_data/3.png)"
      }
    }
  ]
}
END

curl -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @instances.json \
  localhost:8080/predictions/mnist

This request uses one of the test images included with the TorchServe example.

If successful, the server returns the following prediction:

{"predictions": [3]}

To stop the container, run the following command in your shell:
```
docker stop local_mnist
```

Push the container image to Artifact Registry

Configure Docker to access Artifact Registry. Then push your container image to your Artifact Registry repository.

To give your local Docker installation permission to push to Artifact Registry in your chosen region, run the following command in your shell:
```
gcloud auth configure-docker REGION-docker.pkg.dev
```
Replace REGION with the region where you created your repository in a previous section.
To push the container image that you just built to Artifact Registry, run the following command in your shell:
```
docker push REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist
```
Replace the following, as you did in the previous section:
- REGION: the region of your Artifact Registry repository, as specified in a previous section
- PROJECT_ID: the ID of your Google Cloud project

Deploying the container

This section walks through creating a model and model version on AI Platform Prediction in order to serve prediction. The model version runs your container image as a container in order to serve predictions.

This tutorial provides specific configuration options to use when you create your model and model version. If you want to learn about different configuration options, read Deploying models.

Create a model

To create a model resource, run the following command in your shell:

gcloud beta ai-platform models create getting_started_pytorch \
  --region=REGION \
  --enable-logging \
  --enable-console-logging

Replace REGION with the same region where your you created your Artifact Registry repository in a previous section.

Create a model version

To create a model version resource, run the following command in your shell:

gcloud beta ai-platform versions create v1 \
  --region=REGION \
  --model=getting_started_pytorch \
  --machine-type=n1-standard-4 \
  --image=REGION-docker.pkg.dev/PROJECT_ID/getting-started-pytorch/serve-mnist \
  --ports=8080 \
  --health-route=/ping \
  --predict-route=/predictions/mnist

Replace the following:

REGION: the region where you created your Artifact Registry repository and AI Platform Prediction model in previous sections
PROJECT_ID: the ID of your Google Cloud project

The container-related flags in this command do the following:

--image: The URI of your container image.
--ports: The port that your container's HTTP server listens for requests on. The parent image, from which you derived your container image, configures TorchServe to use port 8080.
--health-route: The path on which your container's HTTP server listens for health checks. TorchServe always listens for health checks on the /ping path.
--predict-route: The path on which your container's HTTP server listens for prediction requests. TorchServe always listens for prediction requests on the /predictions/MODEL path.

MODEL is the name of the model that you specified when you started TorchServe. In this case, the name is mnist, which you set in this Docker instruction from a previous section:
```
CMD ["torchserve", \
  "--start", \
  "--ts-config=/home/model-server/config.properties", \
  "--models", \
  "mnist=mnist.mar"]
```

Getting a prediction

The TorchServe example files that you downloaded in a previous section include test images. The container's TorchServe configuration expects to receive prediction requests in JSON format, with the image as a base64-encoded string in the data.b64 field of each instance.

For example, to classify test_data/3.png, run the following commands in your shell:

cat > instances.json <<END
{
 "instances": [
   {
     "data": {
       "b64": "$(base64 --wrap=0 test_data/3.png)"
     }
   }
 ]
}
END

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @instances.json \
  https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models/getting_started_pytorch/versions/v1:predict

Replace the following:

REGION: the region where you created your AI Platform Prediction model a previous section
PROJECT_ID: the ID of your Google Cloud project

If successful, the model version returns the following prediction:

{"predictions": [3]}

Cleaning up

To avoid incurring further AI Platform Prediction charges and Artifact Registry charges, delete the Google Cloud resources that you created during this tutorial:

To delete your model version, run the following command in your shell:
```
gcloud ai-platform versions delete v1 \
  --region=REGION \
  --model=getting_started_pytorch \
  --quiet
```
Replace REGION with the region where you created your model in a previous section.
To delete your model, run the following command in your shell:
```
gcloud ai-platform models delete getting_started_pytorch \
  --region=REGION \
  --quiet
```
Replace REGION with the region where you created your model in a previous section.
To delete your Artifact Registry repository and the container image in it, run the following command in your shell:
```
gcloud beta artifacts repositories delete getting-started-pytorch \
  --location=REGION \
  --quiet
```
Replace REGION with the region where you created your Artifact Registry repository in a previous section.

What's next

If you want to design your own container image—either from scratch, or by deriving from an existing third-party container image—read Custom container requirements.
Learn more about using a custom container for prediction, including compatibility with other AI Platform Prediction features and configuration options that you can specify for your container during deployment.