To start a monitoring job, Model Monitoring needs to know the schema of your tabular dataset in order to correctly parse the input payload.
For AutoML models, Model Monitoring automatically parses the schema because Vertex AI already has access to your training data.
For custom-trained models:
Model Monitoring can automatically parse schemas for models that use a standard key-value input format.
For custom-trained models that don't use a key-value input format, you may need to provide a schema when creating your monitoring job.
Schema generation varies based on whether you are enabling Model Monitoring for an online prediction endpoint or batch predictions.
Schema parsing for online prediction endpoints
For online prediction endpoints, you can let Model Monitoring parse the schema automatically or upload a schema when you create a monitoring job.
Automatic schema parsing
After you enable skew or drift detection for an online endpoint, Model Monitoring can usually automatically parse the input schema. For automatic schema parsing, Model Monitoring analyzes the first 1,000 input requests to determine the schema.
Automatic schema parsing works best when the input requests are formatted as
key-value pairs, where "key"
is the name of the feature
and "value"
is the value of the feature. For example:
"key":"value"
{"TenYearCHD":"0", "glucose":"5.4", "heartRate":"1", "age":"30", "prevalentStroke":"0", "gender":"f", "ethnicity":"latin american"}
If the inputs are not in
"key":"value"
format,
Model Monitoring tries to identify the data type of each
feature, and automatically assigns a default feature name for each input.
Custom instance schemas
You can provide your own input schema when you create a Model Monitoring job to guarantee that Model Monitoring correctly parses your model's inputs.
This schema is called the analysis instance schema. The schema file specifies the format of the input payload, the names of each feature, and the type of each feature.
The schema must be written as a YAML file in the Open API format. The following example is for a prediction request
with the object
format:
type: object properties: age: type: string BMI: type: number TenYearCHD: type: string cigsPerDay: type: array items: type: string BPMeds: type: string required: - age - BMI - TenYearCHD - cigsPerDay - BPMeds
type
indicates whether your prediction request is one of the following formats:object
: key-value pairsarray
: array-likestring
: csv-string
properties
indicates the type of each individual feature.If the request is in array or csv-string format, specify the order in which the features are listed in each request under the
required
field.
If your prediction request is in array or csv-string format, represent any missing features as null values. For example, consider a prediction request with five features:
[feature_a, feature_b, feature_c, feature_d, feature_e]
If feature_c
allows missing values, a sample request missing feature_c
would be: {[1, 2, , 4, 6]}
. The list length is still 5, with one null value
in the middle.
Schema parsing for batch predictions
For batch predictions, you can let Model Monitoring parse the schema automatically or upload a schema when you create a monitoring job.
Automatic schema parsing
If you don't provide a schema during monitoring job creation, Model Monitoring infers the data types of your features and generates your schema based on your training data.
Model Monitoring also needs to know which feature is the target column, which is the feature being predicted. The target column is excluded from the schema and feature skew metric. You can specify the target column while creating a monitoring job.
Target column specification
If you don't specify the target column while creating a monitoring job, Model Monitoring labels the last feature name in your training data as the target column.
For example, Model Monitoring labels column_c
in this
CSV training data as the target column because column_c
is at the end of the
first row:
column_a, column_b, column_d, column_c 1,"a", 2, "c" 2,"b", 342, "d"
Similarly, Model Monitoring labels column_c
in this
JSONL file as the target column because column_c
is at the end of the first
row:
{"column_a": 1, "column_b": "a", "column_d": 2, "column_c": "c" } {"column_a": 2, "column_b": "b", "column_c": "d", "column_d": 342}
In both examples, the final schema only contains column_a
, column_b
, and
column_d
.
Custom schemas
Your custom schema specifies the format of the input payload, the names of each feature, and the type of each feature.
The schema must be written as a YAML file with Open API syntax. The following example is for a prediction request
with the object
format:
type: object properties: age: type: string BMI: type: number TenYearCHD: type: string cigsPerDay: type: array items: type: string BPMeds: type: string required: - age - BMI - TenYearCHD - cigsPerDay - BPMeds
Model Monitoring calculates feature skew based on the JSON Lines batch prediction output. If your data type contains an array, the length of the array in the input data must be equal to the number of features specified in the yaml file. Otherwise Model Monitoring excludes the prediction instance with the incorrect array length from feature skew calculation.
For example, the arrays in the following data types contain two features:
Array:
{[[1, 2]]}
"Key"/"Value":
{"key": 0, "values": [[1, 2]]}
The corresponding schema must also specify two features:
type: object properties: name: type: string age: type: number required: - name - age
What's next
- Enable skew and drift detection for your models.
- Try the example notebook in Colab or view it on GitHub.