To serve predictions from AI Platform Prediction, you must export your trained machine learning model as one or more artifacts. This guide describes the different ways to export trained models for deployment on AI Platform Prediction.
The following methods of exporting your model apply whether you perform training on AI Platform Prediction or perform training elsewhere and just want to deploy to AI Platform Prediction to serve predictions.
Once you have exported your model, read the guide to deploying models to learn how to create model and version resources on AI Platform Prediction for serving predictions.
Custom code for prediction
If you export a scikit-learn pipeline or a custom prediction routine, you can include custom code to run at prediction time, beyond just the prediction routine that your machine learning framework provides. You can use this to preprocess prediction input, postprocess prediction results, or add custom logging.
Maximum model size
The total file size of the model artifacts that you deploy to AI Platform Prediction must be 500 MB or less if you use a legacy (MLS1) machine type. It must be 10 GB or less if you use a Compute Engine (N1) machine type. Learn more about machine types for online prediction.
Export a TensorFlow SavedModel
If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory. To learn how to export a TensorFlow SavedModel that you can deploy to AI Platform Prediction, read the guide to export a SavedModel for prediction.
If you want to deploy your TensorFlow model as part of a custom prediction routine, you can export it as a SavedModel or as a different set of artifacts. Read the guide to custom prediction routines to learn more.
Export an XGBoost booster
If you use XGBoost to train a model, you may export the trained model in one of three ways:
- Use
xgboost.Booster
'ssave_model
method to export a file namedmodel.bst
. - Use
sklearn.externals.joblib
to export a file namedmodel.joblib
. - Use Python's
pickle
module to export a file namedmodel.pkl
.
Your model artifact's filename must exactly match one of these options.
The following tabbed examples show how to train and export a model in each of the three ways:
xgboost.Booster
from sklearn import datasets
import xgboost as xgb
iris = datasets.load_iris()
dtrain = xgb.DMatrix(iris.data, label=iris.target)
bst = xgb.train({}, dtrain, 20)
bst.save_model('model.bst')
joblib
from sklearn import datasets
from sklearn.externals import joblib
import xgboost as xgb
iris = datasets.load_iris()
dtrain = xgb.DMatrix(iris.data, label=iris.target)
bst = xgb.train({}, dtrain, 20)
joblib.dump(bst, 'model.joblib')
pickle
import pickle
from sklearn import datasets
import xgboost as xgb
iris = datasets.load_iris()
dtrain = xgb.DMatrix(iris.data, label=iris.target)
bst = xgb.train({}, dtrain, 20)
with open('model.pkl', 'wb') as model_file:
pickle.dump(bst, model_file)
Export a scikit-learn estimator
If you use scikit-learn to train a model, you may export it in one of two ways:
- Use
sklearn.externals.joblib
to export a file namedmodel.joblib
. - Use Python's
pickle
module to export a file namedmodel.pkl
.
Your model artifact's filename must exactly match one of these options.
The following tabbed examples show how to train and export a model in each of the two ways:
joblib
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
iris = datasets.load_iris()
classifier = RandomForestClassifier()
classifier.fit(iris.data, iris.target)
joblib.dump(classifier, 'model.joblib')
pickle
import pickle
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
iris = datasets.load_iris()
classifier = RandomForestClassifier()
classifier.fit(iris.data, iris.target)
with open('model.pkl', 'wb') as model_file:
pickle.dump(classifier, model_file)
Export a scikit-learn pipeline
The scikit-learn Pipeline class can help you compose multiple estimators. For example, you can use transformers to preprocess data and pass the transformed data to a classifier. You can export a Pipeline in the same two ways that you can export other scikit-learn estimators:
- Use
sklearn.externals.joblib
to export a file namedmodel.joblib
. - Use Python's
pickle
module to export a file namedmodel.pkl
.
Your model artifact's filename must exactly match one of these options.
The following tabbed examples show how to train and export a model in each of the two ways:
joblib
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
iris = datasets.load_iris()
pipeline = Pipeline([
('feature_selection', SelectKBest(chi2, k=2)),
('classification', RandomForestClassifier())
])
pipeline.fit(iris.data, iris.target)
joblib.dump(pipeline, 'model.joblib')
pickle
import pickle
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
iris = datasets.load_iris()
pipeline = Pipeline([
('feature_selection', SelectKBest(chi2, k=2)),
('classification', RandomForestClassifier())
])
pipeline.fit(iris.data, iris.target)
with open('model.pkl', 'wb') as model_file:
pickle.dump(pipeline, model_file)
Export custom pipeline code
If you only use transformers from the sklearn
package to build your pipeline,
it's sufficient to export a single model.joblib
or model.pkl
artifact.
Your AI Platform Prediction deployment can use these transformers at prediction
time because scikit-learn is included in the AI Platform Prediction runtime
image.
However, you may also use scikit-learn's
FunctionTransformer
or
TransformerMixin
class to incorporate custom transformations. If you do this, you need to export
your custom code as a source distribution
package
so you can provide it to AI Platform Prediction.
The following example shows how to use custom code in a pipeline and export it
for AI Platform Prediction. The example uses both FunctionTransformer
and
TransformerMixin
. In general, FunctionTransformer
may be more convenient for
basic transformations, but TransformerMixin
lets you define a more complex
transformation that saves serialized state at training time that can be used
during prediction.
First, write the following code to a file named my_module.py
:
import numpy as np
from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin
from sklearn.utils.validation import check_is_fitted
def add_sum(X):
sums = X.sum(1).reshape((-1,1))
transformed_X = np.append(X, sums, 1)
return transformed_X
class MySimpleScaler(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
self.means = np.mean(X, axis=0)
self.stds = np.std(X, axis=0)
if not self.stds.all():
raise ValueError('At least one column has standard deviation of 0.')
return self
def transform(self, X):
check_is_fitted(self, ('means', 'stds'))
transformed_X = (X - self.means) / self.stds
return transformed_X
Then train and export a pipeline using the following transformations. Toggle between the following tabs to view the two ways to export the pipeline:
joblib
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
import my_module
iris = datasets.load_iris()
pipeline = Pipeline([
('scale_data', my_module.MySimpleScaler()),
('add_sum_column', FunctionTransformer(my_module.add_sum)),
('classification', RandomForestClassifier())
])
pipeline.fit(iris.data, iris.target)
joblib.dump(pipeline, 'model.joblib')
pickle
import pickle
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
import my_module
iris = datasets.load_iris()
pipeline = Pipeline([
('scale_data', my_module.MySimpleScaler()),
('add_sum_column', FunctionTransformer(my_module.add_sum)),
('classification', RandomForestClassifier())
])
pipeline.fit(iris.data, iris.target)
with open('model.pkl', 'wb') as model_file:
pickle.dump(pipeline, model_file)
Finally, create a .tar.gz
source distribution package containing my_module
.
To do this, first create the following setup.py
file:
from setuptools import setup
setup(name='my_custom_code', version='0.1', scripts=['my_module.py'])
Then run python setup.py sdist --formats=gztar
in your shell to create
dist/my_custom_code-0.1.tar.gz
.
Read the guide to deploying models to learn how
to deploy this tar file along with your model.joblib
or model.pkl
file.
Note that my_module.py
uses NumPy and scikit-learn as dependencies. Since both
of these libraries are included in the AI Platform Prediction runtime
image, there is no need to include them in
the tar file.
For a more in-depth tutorial about using custom pipeline code, see Using custom code for scikit-learn pipelines.
Export a custom prediction routine
For maximum flexibility, create and export a custom prediction routine. Custom prediction routines let you provide AI Platform Prediction with Python code you want to run at prediction time, as well as any training artifacts you want to use during prediction.
Read the guide to custom prediction routines to learn how to use them.
What's next
- Learn how to deploy your exported model to AI Platform Prediction to serve predictions.
- Work through a tutorial about using custom code for scikit-learn pipelines.
- Learn about how to create a custom prediction routine.