Export model artifacts for prediction

Google Distributed Cloud (GDC) air-gapped offers prebuilt containers to serve online predictions from models trained using the following machine learning (ML) frameworks:

TensorFlow
PyTorch

To use one of these prebuilt containers, you must save your model as one or more model artifacts that comply with the requirements of the prebuilt container. These requirements apply whether or not your model artifacts are created on Distributed Cloud.

Before you begin

Before exporting model artifacts, perform the following steps:

Create and train a prediction model targeting one of the supported containers.
If you don't have a project, set up a project for Vertex AI.
Work with your Infrastructure Operator (IO) to create the prediction cluster.

The IO creates the cluster for you, associates it with your project, and assigns the appropriate node pools within the cluster, considering the resources you need for online predictions.
Create a storage bucket for your project.
Create the Vertex AI Default Serving (vai-default-serving-sa) service account within your project. For information about service accounts, see Set up service accounts.
Grant the Project Bucket Object Viewer (project-bucket-object-viewer) role to the Vertex AI Default Serving (vai-default-serving-sa) service account for the storage bucket you created. For information about granting bucket access to service accounts, see Grant bucket access.
To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User (vertex-ai-prediction-user) role. For information about this role, see Prepare IAM permissions.

Framework-specific requirements for exporting to prebuilt containers

Depending on the ML framework you plan to use for prediction, you must export model artifacts in different formats. The following sections describe the acceptable model formats for each ML framework.

TensorFlow

If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory.

There are several ways to export SavedModels from TensorFlow training code. The following list describes a few ways that work for various TensorFlow APIs:

If you use Keras for training, use tf.keras.Model.save to export a SavedModel.
If you use an Estimator for training, use tf.estimator.Estimator.export_saved_model to export a SavedModel.
Otherwise, use tf.saved_model.save or use tf.compat.v1.saved_model.SavedModelBuilder.

If you are not using Keras or an Estimator, then make sure to use the serve tag and serving_default signature when you export your SavedModel to ensure Vertex AI can use your model artifacts to serve predictions. Keras and Estimator handle this task automatically. Learn more about specifying signatures during export.

To serve predictions using these artifacts, create a Model with the prebuilt container for prediction matching the version of TensorFlow that you used for training.

PyTorch

If you use PyTorch to train a model, you must package the model artifacts including either a default or custom handler by creating an archive file using Torch model archiver. The prebuilt PyTorch images expect the archive to be named model.mar, so make sure you set the model name to model.

For information about optimizing the memory usage, latency, or throughput of a PyTorch model served with TorchServe, see the PyTorch performance guide.

Upload your model

You must upload your model to the storage bucket you created. For more information about uploading objects to storage buckets, see Upload and download storage objects in projects.

The path to the storage bucket of your model must have the following structure:

s3://BUCKET_NAME/MODEL_ID/MODEL_VERSION_ID

For export details, see the framework-specific requirements for exporting to prebuilt containers.