Monitor feature skew and drift

Vertex Model Monitoring supports feature skew and drift detection for categorical and numerical input features.

This page describes how to use the following features of Model Monitoring:

Enable skew or drift detection

The following sections describe how skew or drift detection works.

Parse input feature values from prediction request logs

To perform skew or drift detection, Model Monitoring requires you to provide the model in production with input feature values. Input feature values are parsed from the payload of the prediction requests made to a prediction endpoint. A fraction of the incoming requests is logged in a BigQuery table in your Google Cloud project. From the logged requests, the input feature values are parsed for analysis.

To enable skew or drift detection, you need to specify the following key parameters:

Type of monitoring

To enable skew detection for a model, Model Monitoring requires that you provide the training data that was used to train the model. This training data is required to compute the training data baseline distributions. Specify the URI for your training data in Cloud Storage or BigQuery.

To enable drift detection, training data is not required.

User email

Model Monitoring requires you to provide an email address to serve as the email ID. Model Monitoring sends alert notifications to this email address.

Prediction request sampling rate

For cost efficiency, it is usually sufficient to monitor a subset of the production inputs to a model. This parameter controls the fraction of the incoming prediction requests that are logged and analyzed for monitoring purposes.

This is an optional parameter. If the user doesn't configure this parameter, the Model Monitoring service logs all prediction requests.

Frequency of monitoring

Monitoring frequency determines how often a deployed model's inputs are monitored for skew or drift. At the specified frequency, a monitoring job runs and performs monitoring on the recently logged inputs. Each monitoring job monitors the inputs logged between the timestamps (cutoff time—monitoring window—cutoff time). Monitoring frequency determines the timespan, or monitoring window size, of logged data that is analyzed in each monitoring run. In the Google Cloud Console, you can see the time when each monitoring job runs, and also visualize the data analyzed in each job.

The minimum granularity is 1 hour. If you use the Cloud SDK to set up a Model Monitoring job, the default value is 24 hours.

Features to monitor and their alerting thresholds

You can specify which input features to monitor, along with the alerting threshold for each feature. The alerting threshold determines when to throw an alert. These thresholds indicate the statistical distance metric computed between the input feature distribution and its corresponding baseline. You can configure a separate threshold value for each monitored feature.

If this list is not provided, by default, every categorical and numerical feature is monitored, with the following default threshold values:

  • Categorical feature: 0.3
  • Numerical feature: 0.3

Input schemas

To get feature values, Model Monitoring parses the request payload of the online prediction requests made to the model. To parse the data correctly, the monitoring job needs to know the schema of the input payload.

Automatic schema parsing

After skew or drift detection is enabled, Model Monitoring can usually automatically parse the input schema. For automatic schema parsing, Model Monitoring analyzes the first 1,000 input requests to determine the schema.

Automatic schema parsing works best when the input requests are formatted as key-value pairs, where "key" is the name of the feature and "value" is the value of the feature. For example:

"key":"value"
{"TenYearCHD":"0", "glucose":"5.4", "heartRate":"1", "age":"30",
"prevalentStroke":"0", "gender":"f", "ethnicity":"latin american"}

If the inputs are not in "key":"value" format, Model Monitoring tries to identify the data type of each feature, and automatically assigns a default feature name for each input.

In these situations, you can explicitly specify your own input schema, so Model Monitoring can parse your model's inputs correctly.

Custom instance schemas for parsing input

This schema is called the analysis instance schema. The schema file specifies the format of the input payload, the names of each feature, and the type of each feature.

The schema must be written as a YAML file in the Open API schema format. For example:

type: object
properties:
  BMI:
    type: number
  BPMeds:
    type: string
  TenYearCHD:
    type: string
  age:
    type: string
  cigsPerDay:
    type: array
    items:
      type: string
required:
- age
- BMI
- TenYearCHD
- cigsPerDay
- BPMeds
  • type indicates whether your prediction request is one of the following formats:

    • object: key-value pairs
    • array: array-like
    • string: csv-string
  • properties indicates the type of each individual feature.

  • If the request is in array or csv-string format, specify the order in which the features are listed in each request. Specify the order under the field named required.

How to represent missing values

If your prediction request is in array or csv-string format, represent any missing features as null values. For example, consider a prediction request with five features:

[feature_a, feature_b, feature_c, feature_d, feature_e]

If feature_c allows missing values, a sample request missing feature_c would be: {[1, 2, , 4, 6]}. The list length is still 5, with one null value in the middle.

Configuration parameters at the endpoint scope

An online prediction endpoint can host multiple models. When you enable skew or drift detection on an endpoint, the following configuration parameters are shared across all models hosted in that endpoint:

  • Type of detection
  • Monitoring frequency
  • Fraction of input requests monitored

For the other configuration parameters, you can set different values for each model.

You can view your configuration parameters in the Cloud Console.

Create a Model Monitoring job

To set up either skew detection or drift detection, you can create a model deployment monitoring job using the Model Monitoring API, the Cloud Console, or the Cloud SDK.

Create a job by using the Model Monitoring API

For information about the full end-to-end Model Monitoring API workflow, see the example notebook.

Create a job by using the Cloud Console

To create a model deployment monitoring job by using the Cloud Console, create or update an existing endpoint. The following example shows creating a standard endpoint. To enable model monitoring for an existing endpoint, edit its setting to enable model monitoring.

You can create a model deployment monitoring job by using a standard endpoint with a Type that is either Imported Custom Training or Tabular AutoML.

  1. In the Google Cloud Console, go to the Vertex AI Endpoints page.

    Go to Endpoints

  2. Click Create Endpoint.

  3. In the New endpoint pane, name your endpoint and set a region.

  4. Click Continue.

  5. Add model settings. For the model name, you must select a custom or tabular model.

  6. Click Continue.

  7. Configure model monitoring settings to apply to all models deployed to the endpoint.

    Skew detection

    1. To enable model monitoring, in Model monitoring, select Enable model monitoring for this endpoint.
    2. Enter the Monitoring job display name.
    3. Specify other settings to apply to all models deployed to this new endpoint.
    4. Click Continue.
    5. To choose the skew objective, under Monitoring objective, select Training prediction skew detection.
    6. Under Training prediction skew detection, provide a training data source.
    7. Optional: Under Alert thresholds, specify thresholds at which to trigger alerts. For information about how to format the thresholds, hold the pointer over the Help icon.

    Drift detection

    1. To enable model monitoring, in Model monitoring, select Enable model monitoring for this endpoint.
    2. Enter the Monitoring job display name.
    3. Specify other settings to apply to all models deployed to this new endpoint.
    4. Click Continue.
    5. To choose the drift objective, under Monitoring objective, select Prediction drift detection.
    6. Optional: Under Alert thresholds, specify thresholds at which to trigger alerts. For information about how to format the thresholds, hold the pointer over the Help icon.

Create a job by using the Cloud SDK

To create a model deployment monitoring job by using the Cloud SDK, download and install the Cloud SDK.

Skew detection

If the training dataset is available, you can create a Model Monitoring job with skew detection for all the deployed models under the endpoint ENDPOINT_ID by running gcloud beta ai model-monitoring-jobs create:

gcloud beta ai model-monitoring-jobs create \
    --project=PROJECT_ID \
    --region=REGION \
    --display-name=MONITORING_JOB_NAME \
    --emails=EMAIL_ADDRESS_1,EMAIL_ADDRESS_2 \
    --endpoint=ENDPOINT_ID \
    --feature-thresholds=FEATURE_1=THRESHOLD_1,FEATURE_2=THRESHOLD_2 \
    --prediction-sampling-rate=SAMPLING_RATE \
    --monitoring-frequency=MONITORING_FREQUENCY \
    --target-field=TARGET_FIELD \
    --bigquery-uri=BIGQUERY_URI

The preceding command takes the training dataset from BigQuery and is in the following format:

"bq://\.\.\"

You can also specify the training dataset from Cloud Storage in CSV or TFRecord format.

To use CSV, replace the bigquery-uri flag with --data-format=csv --gcs-uris=gs://some_bucket/some_file.

To use TFRecord, replace the bigquery-uri flag with --data-format=tf-record --gcs-uris=gs://some_bucket/some_file.

You can also use a managed dataset for tabular AutoML by replacing the bigquery-uri flag with --dataset=dataset-id.

Drift detection

If the training dataset is not available, you can create a Model Monitoring job with drift detection for all the deployed models under the endpoint ENDPOINT_ID by running gcloud beta ai model-monitoring-jobs create:

gcloud beta ai model-monitoring-jobs create \
    --project=PROJECT_ID \
    --region=REGION \
    --display-name=MONITORING_JOB_NAME
    --emails=EMAIL_ADDRESS_1,EMAIL_ADDRESS_2 \
    --endpoint=ENDPOINT_ID \
    --feature-thresholds=FEATURE_1=THRESHOLD_1,FEATURE_2=THRESHOLD_2 \
    --prediction-sampling-rate=SAMPLING_RATE \
    --monitoring-frequency=MONITORING_FREQUENCY

Model Monitoring SDK commands

You can update, pause, and delete a Model Monitoring job by using Cloud SDK.

For example, to update monitoring-frequency of Model Monitoring job 123 under project example in region us-central1:

gcloud beta ai model-monitoring-jobs update 123 \
    --monitoring-frequency=1 --project=example --region=us-central1

To pause the job:

gcloud beta ai model-monitoring-jobs pause 123 --project=example \
    --region=us-central1

To resume the job:

gcloud beta ai model-monitoring-jobs resume 123 --project=example \
    --region=us-central1

To delete the job:

gcloud beta ai model-monitoring-jobs pause 123 --project=example \
    --region=us-central1

For more information, see model-monitoring-jobs in Cloud SDK.

Model Monitoring jobs API

For more information about skew detection or drift detection, see projects.locations.modelDeploymentMonitoringJobs in the Vertex AI REST reference.

Email alerts

For the following events, Model Monitoring sends an email alert to each email address specified when the Model Monitoring job was created:

  • Each time an alerting threshold is crossed
  • Each time skew or drift detection is set up
  • Each time an existing Model Monitoring job configuration is updated

The email alerts contain pertinent information, including:

  • The time at which the monitoring job ran
  • The name of the feature that has skew or drift
  • The alerting threshold as well as the recorded statistical distance measure

Analyze skew and drift data

You can use Cloud Console to visualize the distributions of each monitored feature and learn which changes led to skew or drift. You can view the feature value distributions as a histogram.

For each monitored feature, you can view the distributions of the 50 most recent monitoring jobs in the Cloud Console. For skew detection, the training data distribution is displayed right next to the input data distribution:

Histograms showing example input data distribution and training
            data distribution for skew detection.

Visualizing data distribution as histograms lets you quickly focus on the changes that occurred in the data. Afterward, you might decide to adjust your feature generation pipeline or retrain the model.

View feature distribution histograms

  1. To navigate to the feature distribution histograms in the Cloud Console, go to the Endpoints page.

    Go to Endpoints

  2. On the Endpoints page, click the endpoint to analyze.

  3. On the detail page for the endpoint you selected, there is a list of all the models deployed on that endpoint. Click the name of a model to analyze.

  4. The detail page for the model lists the model's input features, along with pertinent information, such as the alert threshold for each feature and the number of prior alerts for the feature.

  5. To analyze a feature, click the name of a feature. A page shows the feature distribution histograms for that feature.

What's next