Exporting a SavedModel for prediction

To deploy your trained models to AI Platform Prediction and use them to serve predictions, you must first export them in the TensorFlow SavedModel format.

This page outlines some important things to consider as you create your SavedModel. For more detailed information about exporting to a SavedModel, read the TensorFlow guide to SavedModels and the TensorFlow guide to saving Keras models. For details on how to deploy your SavedModel to AI Platform Prediction to serve predictions, read the guide to deploying models. For general background information on the prediction process, see the prediction overview page.

Custom prediction routines

As an alternative to deploying a SavedModel, you can also create and deploy a custom prediction routine. A custom prediction routine can combine a SavedModel (or a trained model saved a different way) with other training artifacts and Python code you provide to customize how AI Platform Prediction handles prediction requests. For example, you can use this flexibility to preprocess prediction input before your model makes a prediction.

To learn more, read the guide to custom prediction routines.

Understanding training graphs and serving graphs

When you have trained your model and exported it as a SavedModel, there are some important steps to take before you are ready to get predictions.

There are some key differences between a training graph and a serving graph. Training graphs contain features which are not appropriate for serving, such as:

  • file readers
  • input queues
  • dropout layers
  • loss functions
  • optimizers

Because the process of serving predictions has different needs than the process of training, it is a best practice to export a separate graph specifically for serving predictions.

Understanding the SavedModel

A SavedModel is TensorFlow's recommended format for saving models, and it is the required format for deploying trained TensorFlow models on AI Platform Prediction. Exporting your trained model as a SavedModel saves your training graph with its assets, variables and metadata in a format that AI Platform Prediction can consume and restore for predictions.

After exporting a SavedModel, you have a SavedModel directory that contains the following:

  • your training graph(s), saved in SavedModel protocol buffers
  • external files, called assets
  • variables, which are saved as checkpoint files

When you deploy your SavedModel to AI Platform Prediction, you must include the entire SavedModel directory, not just the SavedModel protocol buffer file that contains your graph and its metadata. This file usually has an extension of either .pb or .pbtxt.

The SavedModel allows you to save multiple versions of a graph that share the same assets and variables (or checkpoints). For example, you may want to develop two versions of the same graph: one to run on CPUs, and another to run on GPUs.

Learn more about the structure of a SavedModel directory.

Exporting from various TensorFlow APIs

There are several ways to export SavedModels from your TensorFlow training code. The following list describes a few different ways that work for various TensorFlow APIs:

Compatibility with AI Explanations

If you want to use AI Explanations with your model, learn about additional requirements for your SavedModel.

Check and adjust model size

Your SavedModel must be 500 MB or smaller if you want to deploy it to a model version that uses a legacy (MLS1) machine type. It can be up to 10 GB if you use a Compute Engine (N1) machine type. Learn more about machine types for online prediction.

This size limit includes all the assets and variables in your SavedModel directory, not just the SavedModel protocol buffer file itself (that is, saved_model.pb or saved_model.pbtxt).

To check your model size during development, export a SavedModel and check the file size of the directory.

If your SavedModel exceeds the 500 MB limit:

Following these steps can bring the SavedModel under the 500 MB limit and decrease the latency of predictions. The benefits include better performance and not having to request and wait for a quota increase.

If you still need additional quota, learn how to request a quota increase.

Build an optimal prediction graph

Training produces multiple checkpoints that are not used for serving predictions. Be sure to upload a directory free from those artifacts, containing only the model to be deployed.

For example, if you export summaries during the training process for visualization in TensorBoard, you will want to be sure they are not included in your SavedModel. These TensorBoard summaries are not necessary for a prediction graph.

Reduce precision to decrease file size

Reducing the precision of variables and input data is a tradeoff that reduces your model size significantly with some cost of prediction accuracy. High-precision data is stored less efficiently than low-precision data. Although low-precision data is a source of noise, a neural network may "disregard" this noise and still produce fairly accurate predictions.

If using these methods results in too large a loss in prediction accuracy for your use case, try requesting a quota increase.

  • Shrink the file size by reducing the size of weights, which default to floating-point numbers that are difficult to store efficiently. These inefficiently stored weights are the largest contributor to the overall file size of the model.

  • Quantize your continuous data in order to reduce the size of your model by up to 75% without sacrificing a significant amount of accuracy.

  • Use less precise variables. For example, change the data type (dtype) from int64 to int32.

  • Reduce the size of other input features in the assets folder of your SavedModel directory. For example, use smaller vocabulary sizes for text data.

  • Read about techniques to optimize TensorFlow models for serving in more detail, and work through examples of applying these techniques. The linked techniques only apply if you are using TensorFlow 1.

Tools to inspect SavedModels and graphs

TensorFlow provides a command-line interface that you can use to sanity-check aspects of your SavedModel, such as input formatting and SignatureDefs. Learn more about the SavedModel CLI.

The Graph Transform Tool in TensorFlow can be used to optimize your model for deployment. Although the use of this tool is explained in the context of mobile deployment, it also can be used to optimize models for non-mobile deployment.

Creating serving input functions

If you export your SavedModel using tf.keras.Model.save, then you do not need to specify a serving input function.

Otherwise, define a serving input function when you export the SavedModel. You can do this at the following points in relation to the overall training process:

  • During the end of the training process.
  • As a separate process after training is completed.

The following examples show how to do this for a trained Estimator. See more information about serving input functions for Estimators.

Create serving graph during training

This typically occurs at the end of the training process, but is still tied in with training.

  1. Define a serving input function. In your function, make sure the outermost dimension of your features is None. This corresponds with your batch size, and is demonstrated below when defining the value of items in your features dict using tf.placeholder. The following example code comes from our Census sample:

    def json_serving_input_fn():
        """Build the serving inputs."""
        inputs = {}
        for feat in featurizer.INPUT_COLUMNS:
            inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
    
        return tf.estimator.export.ServingInputReceiver(inputs, inputs)
    
    
  2. Export a SavedModel from your estimator using tf.estimator.Estimator.export_saved_model, passing in the path to your model as the export_dir_base parameter, and the name of your serving input function as the serving_input_fn parameter. In the Census example, the type of Estimator used is tf.estimator.DNNLinearCombinedClassifier.

Create serving graph separately from training

If you have already trained your model, you can get predictions without retraining. This process is very similar to creating a serving graph during training. The main difference is that you create the serving graph in a separate Python script that you run after training is over. The basic idea is to construct the Estimator with the same model_dir used in training, then to call tf.estimator.Estimator.export_saved_model as described in the previous section.

  1. Define a serving input function in your Python script, similarly to how you define it in training:

    def json_serving_input_fn():
        """Build the serving inputs."""
        inputs = {}
        for feat in featurizer.INPUT_COLUMNS:
            inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)
    
        return tf.estimator.export.ServingInputReceiver(inputs, inputs)
    
    
  2. When creating your Estimator, make sure to set the model_dir parameter to be the same one used in training. This makes checkpoints from your previously saved model available to the Estimator.

  3. Finally, use your Estimator to call tf.estimator.Estimator.export_saved_model passing in the path to your model as the export_dir_base parameter, and the name of your serving input function as the serving_input_fn parameter.

TensorFlow tags and signatures

If you export a SavedModel from tf.keras or from a TensorFlow estimator, the exported graph is ready for serving by default.

In other cases, when building a TensorFlow prediction graph, you must specify the correct values for your graph's tags and signatures. TensorFlow provides constants for these tag and signature values, used for the following purposes:

  • To select a graph in your SavedModel for serving predictions
  • To indicate that you are building a prediction signature for your prediction graph

Signatures define the inputs and outputs for your graph. When you build a signature for your prediction graph, you must specify a valid signature constant as the method_name parameter in build_signature_def. For prediction, the best choice is usually PREDICT_METHOD_NAME.

You must use a tag to specify which graph in your SavedModel is used to serve predictions. In add_meta_graph_and_variables, add tag_constants.SERVING to your tags list.

See an example of how to build a prediction graph using the correct constants for tags and signatures.

What's next