This page provides an overview of Vertex AI Model Monitoring for tabular AutoML and tabular custom-trained models. To enable Vertex AI Model Monitoring, see Using Model Monitoring.
A model deployed in production performs best on prediction input data that is similar to the training data. When the input data deviates from the data used to train the model, the model's performance can deteriorate, even if the model itself hasn't changed.
To help you maintain a model's performance, Model Monitoring monitors the model's prediction input data for feature skew and drift:
Training-serving skew occurs when the feature data distribution in production deviates from the feature data distribution used to train the model. If the original training data is available, you can enable skew detection to monitor your models for training-serving skew.
Prediction drift occurs when feature data distribution in production changes significantly over time. If the original training data isn't available, you can enable drift detection to monitor the input data for changes over time.
You can enable both skew and drift detection.
Model Monitoring supports feature skew and drift detection for categorical and numerical features:
Categorical features are data limited by number of possible values, typically grouped by qualitative properties. For example, categories such as product type, country, or customer type.
Numerical features are data that can be any numeric value. For example, weight and height.
Once the skew or drift for a model's feature exceeds an alerting threshold that you set, Model Monitoring sends you an email alert. You can also view the distributions for each feature over time to evaluate whether you need to retrain your model.
Calculate training-serving skew and prediction drift
To detect training-serving skew and prediction drift, Model Monitoring uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores according to the following process:
Calculate the baseline statistical distribution:
For skew detection, the baseline is the statistical distribution of the feature's values in the training data.
For drift detection, the baseline is the statistical distribution of the feature's values seen in production in the recent past.
The distributions for categorical and numerical features are calculated as follows:
For categorical features, the computed distribution is the number or percentage of instances of each possible value of the feature.
For numerical features, Model Monitoring divides the range of possible feature values into equal intervals and computes the number or percentage of feature values that falls in each interval.
The baseline is calculated when you create a Model Monitoring job, and is only recalculated if you update the training dataset for the job.
Calculate the statistical distribution of the latest feature values seen in production.
Compare the distribution of the latest feature values in production against the baseline distribution by calculating a distance score:
For categorical features, the distance score is calculated using the L-infinity distance.
For numerical features, the distance score is calculated using the Jensen-Shannon divergence.
When the distance score between two statistical distributions exceeds the threshold you specify, Model Monitoring identifies the anomaly as skew or drift.
The following example shows skew or drift between the baseline and latest distributions of a categorical feature:
The following example shows skew or drift between the baseline and latest distributions of a numerical feature:
Considerations when using Model Monitoring
For cost efficiency, you can set a prediction request sampling rate to monitor a subset of the production inputs to a model.
You can set a frequency at which a deployed model's recently logged inputs are monitored for skew or drift. Monitoring frequency determines the timespan, or monitoring window size, of logged data that is analyzed in each monitoring run.
You can specify alerting thresholds for each feature you want to monitor. An alert is logged when the statistical distance between the input feature distribution and its corresponding baseline exceeds the specified threshold. By default, every categorical and numerical feature is monitored, with threshold values of 0.3.
An online prediction endpoint can host multiple models. When you enable skew or drift detection on an endpoint, the following configuration parameters are shared across all models hosted in that endpoint:
- Type of detection
- Monitoring frequency
- Fraction of input requests monitored
For the other configuration parameters, you can set different values for each model.
- Learn how schemas work with your tabular monitoring job.
- Enable skew and drift detection for your models.
- Try the example notebook in Colab or view it on GitHub.
- See the TensorFlow Data Validation Anomalies Reference