Models used in production require continuous monitoring to ensure that they continue to perform as expected. After you deploy a model in production, the input data provided to the model for predictions often changes. When the prediction input data deviates from the data that the model was trained on, the performance of the model can deteriorate, even though the model itself hasn't changed.
Model Monitoring supports feature skew and drift detection for categorical and numerical features.
Training-serving skew and prediction drift
Training-serving skew occurs when the feature data distribution in production is different from the feature data distribution used to train the model. When production data deviates from training data, model performance can deteriorate. A model performs best against data that is similar to its training data.
If the original training data is available, you can enable skew detection to monitor your models for training-serving skew.
Prediction drift occurs when feature data distribution in production changes significantly over time. These changes also affect model performance.
If the original training data isn't available, you can enable drift detection to monitor the production inputs for changes over time.
Calculate training-serving skew and prediction drift
For a feature that is monitored for training-serving skew or prediction drift, Model Monitoring computes the statistical distribution of the latest feature values seen in production. This statistical distribution is then compared against another baseline distribution by computing a distance score to determine how similar the production feature values are to the baseline. When the distance score between two statistical distributions exceeds a certain threshold, Model Monitoring identifies that as skew or drift.
Baselines for skew and drift detection
Model Monitoring uses different baselines for skew detection and drift detection:
- For skew detection, the baseline is the statistical distribution of the feature's values in the training data.
- For drift detection, the baseline is the statistical distribution of the feature's values seen in production in the recent past.
Statistical distribution for categorical and numerical features
For categorical features, the computed distribution is the number or percentage of instances of each possible value of the feature. For numerical features, we divide the range of possible feature values into equal intervals, and compute the number or percentage of feature values that fall in each interval.
Example distributions of a numerical feature:
Example distributions of a categorical feature:
Statistical distance for categorical and numerical features
To compare two statistical distributions, Model Monitoring uses the following statistical measures:
- For numerical features, use Jensen-Shannon divergence to calculate the distance between two distributions.
- For categorical features, use L-infinity distance to calculate the distance between two distributions.
For more information, see TensorFlow Data Validation Anomalies Reference.
- Work with Model Monitoring following the API docs.
- Work with Model Monitoring following the Cloud SDK docs.
- Try the example notebook in Colab or view it on GitHub.
- Enable skew and drift detection for your models.