Serving PyTorch models with prebuilt containers on Vertex AI
Erwin Huizega
Developer Advocate
Try Google Cloud
Start building on Google Cloud with $300 in free credits and 20+ always free products.
Free trialMachine learning (ML) practitioners using PyTorch tell us that it can be challenging to advance their ML project beyond experimentation. That's why over the last year, we've prioritized development work that makes it easier for PyTorch users to deploy models in the cloud using Vertex AI. Vertex AI is a fully-managed machine learning platform with tools, workflows, and infrastructure designed to help ML practitioners accelerate and scale ML in production with the benefit of open-source tools.
We are excited to announce that Vertex AI now offers support for pre-built PyTorch serving containers, which makes it easier to bring your PyTorch models into production. You don't have to build a custom container to serve your PyTorch model. With pre-built containers, we've streamlined the ML lifecycle for PyTorch users. This post describes how to deploy your own PyTorch models on Vertex AI. For more details, you can also have a look at the documentation.
Deploy a PyTorch model in three steps
Step 1 - Package your PyTorch model
The first step is to package your trained PyTorch model, including any default or custom handlers, into an archive file using Torch model archiver. The handlers help with the following:
Pre-processing input data into the expected format
Customizing how the model is invoked
Post-processing output from the model
After defining your handlers, you create the model archive file using the Torch model archiver. The pre-built PyTorch image requires the archived model file to be named model.mar, so you need to set the model name as model.
Step 2 - Upload the model to Vertex AI with the pre-built PyTorch serving container image
After you package the PyTorch model, you upload it to the Vertex AI Model Registry, where you can track and manage all of your models and quickly deploy it as a Vertex AI endpoint. You can use the Vertex AI SDK and the pre-built PyTorch serving image to upload the PyTorch model. The Vertex AI SDK provides an optimized experience for interacting with the Vertex AI APIs. Your code will look something like this:
Step 3 - Create a Vertex AI endpoint and deploy the PyTorch model
The third, and last, step is to create a Vertex AI endpoint and deploy the PyTorch model to the endpoint. For this, you can also use the Vertex AI SDK or you can deploy it through the Google Cloud Console. First, you need to create an endpoint.
Next, deploy the model into the endpoint so it can serve online predictions with low latency.
Once your model is deployed, you can integrate it with your business application(s). You can test your endpoint via the Vertex AI SDK, endpoint.predict(instances=test_instance), Cloud Shell, or the Google Cloud Console.
What’s next?
To learn more about PyTorch on Vertex AI, take a look at the documentation, which explains Vertex AI's PyTorch integrations and provides resources that show you how to use PyTorch on Vertex AI. You’ll see how easy it is to train, deploy, and orchestrate models in production using PyTorch and Vertex AI. You can also have a look at the notebook that shows how to deploy and host an image generating model on Vertex AI or try this notebook that deploys a text classification model.