After a model is deployed in production, there are often changes in the input data provided to the model for predictions. When the prediction input data deviates from the data that the model was trained on, the performance of the model can deteriorate, even though the model itself has not changed. Therefore, models used in production require continuous monitoring to ensure that they continue to perform as expected.
Vertex Model Monitoring supports feature skew and drift detection for categorical and numerical features.
Training-serving skew and prediction drift
Training-serving skew occurs when the feature data distribution in production is different from the distribution of feature data that was used to train the model. A model performs best on data that is similar to what it was trained on. Therefore, when production data deviates from training data, it is a strong signal that the performance of the model can deteriorate.
If the original training data is available, you can enable skew detection to monitor your models for training-serving skew.
Prediction drift occurs when feature data distribution in production changes significantly over time. These changes also affect model performance.
If the original training data is not available, you can enable drift detection to monitor the production inputs for changes over time.
Calculate training-serving skew and prediction drift
For a feature that is being monitored for training-serving skew or prediction drift, Model Monitoring computes the statistical distribution of the latest feature values seen in production. This statistical distribution is then compared against another baseline distribution by computing a distance score to determine how similar the production feature values are to the baseline. When this score between two statistical distributions exceeds a certain threshold, Model Monitoring identifies that as skew or drift.
Baselines for skew and drift detection
There are different baselines used for skew detection and drift detection:
- For skew detection, the baseline is the statistical distribution of the feature's values in the training data.
- For drift detection, the baseline is the statistical distribution of the feature's values seen in production in the recent past.
Categorical and numerical features
For categorical features, the computed distribution is the number or percentage of instances of each possible value of the feature. For numerical features, we split the range of possible feature values into equal sized intervals, and compute the number or percentage of feature values that fall in each interval. Example:
To compare two statistical distributions, Model Monitoring uses the following statistical measures:
- Jensen-Shannon divergence is used to calculate the distance between two distributions of numerical features.
- L-infinity distance is used to calculate the distance between two distributions of categorical features. More details can be found here.
- Work with model monitoring following the API docs.
- Work with model monitoring following the Cloud SDK docs.
- Try the example notebook in Colab or view it on GitHub.
- Enable skew and drift detection for your models.