Monitor feature skew and drift

This page describes how to create, manage, and interpret the results of Model Monitoring jobs for models deployed to online prediction endpoints. Vertex AI Model Monitoring supports feature skew and drift detection for categorical and numerical input features.

When a model is deployed in production with Model Monitoring enabled, incoming prediction requests are logged in a BigQuery table in your Google Cloud project. The input feature values contained in the logged requests are then analyzed for skew or drift.

You can enable skew detection if you provide the original training dataset for your model; otherwise, you should enable drift detection. For more information, see Introduction to Vertex AI Model Monitoring.

Prerequisites

To use Model Monitoring, complete the following:

  1. Have an available model in Vertex AI that is either a tabular AutoML or imported custom training type.

    • If you are using an existing endpoint, make sure all the models deployed under the endpoint are tabular AutoML or imported custom training types.
  2. If you are enabling skew detection, upload your training data to Cloud Storage or BigQuery and obtain the URI link to the data. For drift detection, training data is not required.

Create a Model Monitoring job

To set up either skew detection or drift detection, create a model deployment monitoring job:

Console

To create a model deployment monitoring job using the console, create an endpoint:

  1. In the Google Cloud console, go to the Vertex AI Endpoints page.

    Go to Endpoints

  2. Click Create Endpoint.

  3. In the New endpoint pane, name your endpoint and set a region.

  4. Click Continue.

  5. In the Model name field, select an imported custom training or tabular AutoML model.

  6. In the Version field, select a version for your model.

  7. Click Continue.

  8. In the Model monitoring pane, make sure Enable model monitoring for this endpoint is toggled on. Any monitoring settings you configure apply to all models deployed to the endpoint.

  9. Enter a Monitoring job display name.

  10. Click Continue. The Monitoring objective pane opens, with options for skew or drift detection:

    Skew detection

    1. Select Training-serving skew detection.
    2. Under Training data source, provide a training data source.
    3. Under Target column, enter the column name from the training data that the model is trained to predict. This field is excluded from the monitoring analysis.
    4. (Optional) Under Alert thresholds, specify thresholds at which to trigger alerts. For information about how to format the thresholds, hold the pointer over the Help icon.
    5. Click Create.

    Drift detection

    1. Select Prediction drift detection.
    2. (Optional) Under Alert thresholds, specify thresholds at which to trigger alerts. For information about how to format the thresholds, hold the pointer over the Help icon.
    3. Click Create.

gcloud

To create a model deployment monitoring job using the gcloud CLI, first deploy your model to an endpoint.

A monitoring job configuration applies to all deployed models under an endpoint.

Run the gcloud ai model-monitoring-jobs create command.

gcloud ai model-monitoring-jobs create \
  --project=PROJECT_ID \
  --region=REGION \
  --display-name=MONITORING_JOB_NAME \
  --emails=EMAIL_ADDRESS_1,EMAIL_ADDRESS_2 \
  --endpoint=ENDPOINT_ID \
  --feature-thresholds=FEATURE_1=THRESHOLD_1, FEATURE_2=THRESHOLD_2 \
  --prediction-sampling-rate=SAMPLING_RATE \
  --monitoring-frequency=MONITORING_FREQUENCY \
  --target-field=TARGET_FIELD \
  --bigquery-uri=BIGQUERY_URI

where:

  • PROJECT_ID is the ID of your Google Cloud project. For example, my-project.

  • REGION is the location for your monitoring job. For example, us-central1.

  • MONITORING_JOB_NAME is the name of your monitoring job. For example, my-job.

  • EMAIL_ADDRESS is the email address where you want to receive alerts from Model Monitoring. For example, example@example.com.

  • ENDPOINT_ID is the ID of the endpoint under which your model is deployed. For example, 1234567890987654321.

  • (optional) FEATURE_1=THRESHOLD_1 is the alerting threshold for each feature you want to monitor. For example, housing-latitude=0.4. An alert is logged when the statistical distance between the input feature distribution and its corresponding baseline exceeds the specified threshold. By default, every categorical and numerical feature is monitored, with threshold values of 0.3.

  • (optional) SAMPLING_RATE is the fraction of the incoming prediction requests you want to log. For example, 0.5. If not specified, Model Monitoring logs all prediction requests.

  • (optional) MONITORING_FREQUENCY is the frequency at which you want the monitoring job to run on recently logged inputs. The minimum granularity is 1 hour. The default is 24 hours. For example, 2.

  • (required only for skew detection) TARGET_FIELD is the field that is being predicted by the model. This field is excluded from the monitoring analysis. For example, housing-price.

  • (required only for skew detection) BIGQUERY_URI is the link to the training dataset stored in BigQuery, using the following format:

    bq://\PROJECT.\DATASET.\TABLE
    

    For example, bq://\my-project.\housing-data.\san-francisco.

    You can replace the bigquery-uri flag with alternative links to your training dataset:

    • For a CSV file stored in a Cloud Storage bucket, use --data-format=csv --gcs-uris=gs://BUCKET_NAME/OBJECT_NAME.

    • For a TFRecord file stored in a Cloud Storage bucket, use --data-format=tf-record --gcs-uris=gs://BUCKET_NAME/OBJECT_NAME.

    • For a tabular AutoML managed dataset, use --dataset=DATASET_ID.

Python SDK

For information about the full end-to-end Model Monitoring API workflow, see the example notebook.

REST API

  1. If you haven't done so already, deploy your model to an endpoint.

  2. Retrieve the deployed model ID for your model by getting the endpoint information. Note the DEPLOYED_MODEL_ID, which is the deployedModels.id value in the response.

  3. Create a model monitoring job request. The instructions below show how to create a basic monitoring job for drift detection. To customize the JSON request, see the Monitoring job reference.

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: is the ID of your Google Cloud project. For example, my-project.
    • LOCATION: is the location for your monitoring job. For example, us-central1.
    • MONITORING_JOB_NAME: is the name of your monitoring job. For example, my-job.
    • PROJECT_NUMBER: is the number for your Google Cloud project. For example, 1234567890.
    • ENDPOINT_ID is the ID for the endpoint to which your model is deployed. For example, 1234567890.
    • DEPLOYED_MODEL_ID: is the ID for the deployed model.
    • FEATURE:VALUE is the alerting threshold for each feature you want to monitor. For example, "housing-latitude": {"value": 0.4}. An alert is logged when the statistical distance between the input feature distribution and its corresponding baseline exceeds the specified threshold. By default, every categorical and numerical feature is monitored, with threshold values of 0.3.
    • EMAIL_ADDRESS: is the email address where you want to receive alerts from Model Monitoring. For example, example@example.com.

    Request JSON body:

    {
      "displayName":"MONITORING_JOB_NAME",
      "endpoint":"projects/PROJECT_NUMBER/locations/LOCATION/endpoints/ENDPOINT_ID",
      "modelDeploymentMonitoringObjectiveConfigs": {
         "deployedModelId": "DEPLOYED_MODEL_ID",
         "objectiveConfig":  {
            "predictionDriftDetectionConfig": {
                "driftThresholds": {
                  "FEATURE_1": {
                    "value": VALUE_1
                   },
                  "FEATURE_2": {
                    "value": VALUE_2
                   }
                }
             },
          },
      },
      "loggingSamplingStrategy": {
         "randomSampleConfig":  {
            "sampleRate":  0.5,
         },
      },
      "modelDeploymentMonitoringScheduleConfig": {
         "monitorInterval": {
            "seconds": 3600,
         },
      },
      "modelMonitoringAlertConfig": {
         "emailAlertConfig": {
            "userEmails": ["EMAIL_ADDRESS"],
         },
      }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION/modelDeploymentMonitoringJobs/MONITORING_JOB_NUMBER",
      ...
      "state": "JOB_STATE_PENDING",
      "scheduleState": "OFFLINE",
      ...
      "bigqueryTables": [
        {
          "logSource": "SERVING",
          "logType": "PREDICT",
          "bigqueryTablePath": "bq://PROJECT_ID.model_deployment_monitoring_8451189418714202112.serving_predict"
        }
      ],
      ...
    }
    

Once the monitoring job is created, Model Monitoring logs incoming prediction requests in a generated BigQuery table named PROJECT_ID.model_deployment_monitoring_ENDPOINT_ID.serving_predict. If request-response logging is enabled, Model Monitoring logs incoming requests in the same BigQuery table that is used for request-response logging.

(Optional) Configure alerts for the Model Monitoring job

You can monitor and debug your Model Monitoring job through alerts. Model Monitoring automatically notifies you of job updates through email, but you can also set up alerts through Cloud Logging.

Email

For the following events, Model Monitoring sends an email notification to each email address specified when the Model Monitoring job was created:

  • Each time skew or drift detection is set up.
  • Each time an existing Model Monitoring job configuration is updated.
  • Each time a scheduled pipeline fails.

Cloud Logging

To enable pipeline logs, set the enableMonitoringPipelineLogs field in your modelDeploymentMonitoringJobs configuration to true. Debugging logs are written to Cloud Logging when the monitoring job is set up and at each monitoring interval.

The debugging logs are written to Cloud Logging with the log name: model_monitoring. For example:

logName="projects/model-monitoring-demo/logs/aiplatform.googleapis.com%2FFmodel_monitoring" resource.labels.model_deployment_monitoring_job=6680511704087920640

Here is an example of a job progress log entry:

{
"insertId": "e2032791-acb9-4d0f-ac73-89a38788ccf3@a1",
"jsonPayload": {
  "@type": "type.googleapis.com/google.cloud.aiplatform.logging.ModelMonitoringPipelineLogEntry",
  "statusCode": {
    "message": "Scheduled model monitoring pipeline finished successfully for job projects/677687165274/locations/us-central1/modelDeploymentMonitoringJobs/6680511704087920640"
  },
  "modelDeploymentMonitoringJob": "projects/677687165274/locations/us-central1/modelDeploymentMonitoringJobs/6680511704087920640"
},
"resource": {
  "type": "aiplatform.googleapis.com/ModelDeploymentMonitoringJob",
  "labels": {
    "model_deployment_monitoring_job": "6680511704087920640",
    "location": "us-central1",
    "resource_container": "projects/677687165274"
  }
},
"timestamp": "2022-02-04T15:33:54.778883Z",
"severity": "INFO",
"logName": "projects/model-monitoring-demo/logs/staging-aiplatform.sandbox.googleapis.com%2Fmodel_monitoring",
"receiveTimestamp": "2022-02-04T15:33:56.343298321Z"
}

Configure alerts for feature anomalies

Model Monitoring detects an anomaly when the threshold set for a feature is exceeded. Model Monitoring automatically notifies you of detected anomalies through email, but you can also set up alerts through Cloud Logging.

Email

At each monitoring interval, if the threshold of at least one feature exceeds the threshold, Model Monitoring sends an email alert to each email address specified when the Model Monitoring job was created. The email message includes the following:

  • The time at which the monitoring job ran.
  • The name of the feature that has skew or drift.
  • The alerting threshold as well as the recorded statistical distance measure.

Cloud Logging

To enable Cloud Logging alerts, set the enableLogging field of your ModelMonitoringAlertConfig configuration to true.

At each monitoring interval, an anomaly log is written to Cloud Logging if the distribution of at least one feature exceeds the threshold for that feature. You can forward logs to any service that Cloud Logging supports, such as Pub/Sub.

Anomalies are written to Cloud Logging with the log name: model_monitoring_anomaly. For example:

logName="projects/model-monitoring-demo/logs/aiplatform.googleapis.com%2FFmodel_monitoring_anomaly" resource.labels.model_deployment_monitoring_job=6680511704087920640

Here is an example of an anomaly log entry:

{
"insertId": "b0e9c0e9-0979-4aff-a5d3-4c0912469f9a@a1",
"jsonPayload": {
  "anomalyObjective": "RAW_FEATURE_SKEW",
  "endTime": "2022-02-03T19:00:00Z",
  "featureAnomalies": [
    {
      "featureDisplayName": "age",
      "deviation": 0.9,
      "threshold": 0.7
    },
    {
      "featureDisplayName": "education",
      "deviation": 0.6,
      "threshold": 0.3
    }
  ],
  "totalAnomaliesCount": 2,
  "@type": "type.googleapis.com/google.cloud.aiplatform.logging.ModelMonitoringAnomaliesLogEntry",
  "startTime": "2022-02-03T18:00:00Z",
  "modelDeploymentMonitoringJob": "projects/677687165274/locations/us-central1/modelDeploymentMonitoringJobs/6680511704087920640",
  "deployedModelId": "1645828169292316672"
},
"resource": {
  "type": "aiplatform.googleapis.com/ModelDeploymentMonitoringJob",
  "labels": {
    "model_deployment_monitoring_job": "6680511704087920640",
    "location": "us-central1",
    "resource_container": "projects/677687165274"
  }
},
"timestamp": "2022-02-03T19:00:00Z",
"severity": "WARNING",
"logName": "projects/model-monitoring-demo/logs/staging-aiplatform.sandbox.googleapis.com%2Fmodel_monitoring_anomaly",
"receiveTimestamp": "2022-02-03T19:59:52.121398388Z"
}

Update a Model Monitoring job

You can view, update, pause, and delete a Model Monitoring job. You must pause a job before you can delete it.

Console

Pausing and deleting are not supported in the console; use the gcloud CLI instead.

To update parameters for a Model Monitoring job:

  1. In the console, go to the Vertex AI Endpoints page.

    Go to Endpoints

  2. Click the name of the endpoint you want to edit.

  3. Click Edit settings.

  4. In the Edit endpoint pane, select Model monitoring or Monitoring objectives.

  5. Update the parameters you want to change.

  6. Click Update.

To view metrics, alerts, and monitoring properties for a model:

  1. In the console, go to the Vertex AI Endpoints page.

    Go to Endpoints

  2. Click the name of the endpoint.

  3. In the Monitoring column for the model you want to view, click Enabled.

gcloud

Run the following command:

gcloud ai model-monitoring-jobs COMMAND MONITORING_JOB_ID \
  --PARAMETER=VALUE --project=PROJECT_ID --region=LOCATION

where:

  • COMMAND is the command you want to perform on the monitoring job. For example, update, pause, resume, or delete. For more information, see the gcloud CLI reference.

  • MONITORING_JOB_ID is the ID of your monitoring job. For example, 123456789. You can find the ID by retrieving the endpoint information or viewing Monitoring properties for a model in the console. The ID is included in the monitoring job resource name in the format projects/PROJECT_NUMBER/locations/LOCATION/modelDeploymentMonitoringJobs/MONITORING_JOB_ID.

  • (optional) PARAMETER=VALUE is the parameter you want to update. This flag is required only when using the update command. For example, monitoring-frequency=2.

  • PROJECT_ID is the ID for your Google Cloud project. For example, my-project.

  • LOCATION is the location for your monitoring job. For example, us-central1.

Analyze skew and drift data

You can use console to visualize the distributions of each monitored feature and learn which changes led to skew or drift over time. You can view the feature value distributions as a histogram.

Console

  1. To navigate to the feature distribution histograms in the console, go to the Endpoints page.

    Go to Endpoints

  2. On the Endpoints page, click the endpoint you want to analyze.

  3. On the detail page for the endpoint you selected, there is a list of all the models deployed on that endpoint. Click the name of a model to analyze.

  4. The details page for the model lists the model's input features, along with pertinent information, such as the alert threshold for each feature and the number of prior alerts for the feature.

  5. To analyze a feature, click the name of the feature. A page shows the feature distribution histograms for that feature.

    For each monitored feature, you can view the distributions of the 50 most recent monitoring jobs in the console. For skew detection, the training data distribution is displayed right next to the input data distribution:

    Histograms showing example input data distribution and training
          data distribution for skew detection.

    Visualizing data distribution as histograms lets you quickly focus on the changes that occurred in the data. Afterward, you might decide to adjust your feature generation pipeline or retrain the model.

What's next