Google Distributed Cloud (GDC) air-gapped offers prebuilt containers to serve online predictions from models trained using the following machine learning (ML) frameworks:
- TensorFlow
- PyTorch
To use one of these prebuilt containers, you must save your model as one or more model artifacts that comply with the requirements of the prebuilt container. These requirements apply whether or not your model artifacts are created on Distributed Cloud.
Before you begin
Before exporting model artifacts, perform the following steps:
- Create and train a prediction model targeting one of the supported containers.
- If you don't have a project, set up a project for Vertex AI.
Work with your Infrastructure Operator (IO) to create the prediction cluster.
The IO creates the cluster for you, associates it with your project, and assigns the appropriate node pools within the cluster, considering the resources you need for online predictions.
Create the Vertex AI Default Serving (
vai-default-serving-sa
) service account within your project. For information about service accounts, see Set up service accounts.Grant the Project Bucket Object Viewer (
project-bucket-object-viewer
) role to the Vertex AI Default Serving (vai-default-serving-sa
) service account for the storage bucket you created. For information about granting bucket access to service accounts, see Grant bucket access.To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User (
vertex-ai-prediction-user
) role. For information about this role, see Prepare IAM permissions.
Framework-specific requirements for exporting to prebuilt containers
Depending on the ML framework you plan to use for prediction, you must export model artifacts in different formats. The following sections describe the acceptable model formats for each ML framework.
TensorFlow
If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory.
There are several ways to export SavedModels
from TensorFlow training
code. The following list describes a few ways that work for various
TensorFlow APIs:
If you use Keras for training, use
tf.keras.Model.save
to export a SavedModel.If you use an Estimator for training, use
tf.estimator.Estimator.export_saved_model
to export a SavedModel.Otherwise, use
tf.saved_model.save
or usetf.compat.v1.saved_model.SavedModelBuilder
.
If you are not using Keras or an Estimator, then make sure to
use the serve
tag and serving_default
signature when you export your SavedModel
to ensure Vertex AI can use your model artifacts to serve
predictions. Keras and Estimator handle this task automatically.
Learn more about
specifying signatures during export.
To serve predictions using these artifacts, create a Model
with the
prebuilt container for prediction
matching the version of TensorFlow that you used for training.
PyTorch
If you use PyTorch to train a model,
you must package the model artifacts including either a
default or
custom
handler by creating an archive file using
Torch model archiver.
The prebuilt PyTorch images expect the archive to be named model.mar
, so make
sure you set the model name to model.
For information about optimizing the memory usage, latency, or throughput of a PyTorch model served with TorchServe, see the PyTorch performance guide.
Upload your model
You must upload your model to the storage bucket you created. For more information about uploading objects to storage buckets, see Upload and download storage objects in projects.
The path to the storage bucket of your model must have the following structure:
s3://BUCKET_NAME /MODEL_ID /MODEL_VERSION_ID
For export details, see the framework-specific requirements for exporting to prebuilt containers.