Model evaluation components take ground truth and batch prediction results as input and generate evaluation metrics. Ground truth data is "correctly labeled" data as determined by humans, and is usually provided as the target field in the test dataset you used to train the model. For more information, see Model evaluation in Vertex AI.
Vertex AI provides the following model evaluation components:
Model type support
The following table shows supported model types for each model evaluation component:
Model evaluation component | Supported model types |
---|---|
ModelEvaluationClassificationOp |
|
ModelEvaluationRegressionOp |
|
ModelEvaluationForecastingOp |
|
Remove the target field
For some model types, the BatchPredictionJob
component
requires you to exclude the target column (ground truth) from your dataset.
Configure and format slices for your tabular dataset
A slice is a subset of tabular data with specified characteristics. Sliced metrics offers more granular evaluation metrics for AutoML tabular and custom training classification models.
For example, consider a pet adoption dataset where a model is trained to predict whether a pet will be adopted within a week. While it's useful to look at metrics for the whole dataset, you may be interested in metrics related to the pet's species and age. In other words, you're interested in the following subset of the dataset:
Feature | Values |
---|---|
age (in years) |
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 |
species |
"dog", "cat", "rabbit" |
This fine-grained analysis can be helpful in identifying where the model might be most error prone, or ensuring that the model performs well on critical subsets of the data.
Configure slices
To configure a slice, create a JSON file with the following configuration:
{ "FEATURE": { "KIND": { CONFIG } } }
Where:
FEATURE is a feature for which you want to create a slice. For example,
age
.KIND is one of the following:
value
value
creates a single slice on a single value of a feature. Specify CONFIG as a key-value pair where the key isfloat_value
orstring_value
. For example,"float_value": 1.0
.The following example configuration creates a single slice containing data with the
age
of1
:{ "age": { "value": { "float_value": 1.0 } } }
range
range
creates a single slice that includes data within a specified range. Specify CONFIG as two key-value pairs where the keys arelow
andhigh
.The following example configuration creates a single slice containing data where the
age
is any value between1
and3
:{ "age": { "range": { "low": 1, "high": 3 } } }
all_values
all_values
creates a separate slice for each possible label for the feature. Specify CONFIG as"value": true
.The following example configuration creates three slices—one each for
"dog"
,"cat"
, and"rabbit"
:{ "species": { "all_values": { "value": true } } }
You can also use multiple features to create one or more slices.
The following configuration creates a single slice containing data where
the age
is between 1
and 3
and the species
is "dog"
:
{ "age": { "range": { "low": 1, "high": 3 } }, "species": { "value": { "string_value": "dog" } } }
The following configuration creates multiple slices—one for each unique species
in the dataset, where the age
is 1
.
{ "species": { "all_values": { "value": true } }, "age": { "value": { "float_value": 1.0 } } }
The resulting slices contain the following data from the example dataset:
Slice 1:
age:1
andspecies:"dog"
Slice 2:
age:1
andspecies:"cat"
Slice 3:
age:1
andspecies:"rabbit"
Format slices
To format slices for the ModelEvaluationClassificationOp
component, do the
following:
Create a
slicing_spec
. For example:from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig slicing_spec = SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) })
Create a list to store the slice configurations. For example:
slicing_specs = []
Format each
slicing_spec
as JSON or a dictionary. For example:slicing_spec_json = json_format.MessageToJson(slicing_spec) slicing_spec_dict = json_format.MessageToDict(slicing_spec)
Combine each
slicing_spec
into a list. For example:slicing_specs.append(slicing_spec_json)
Pass
slicing_specs
as a parameter to theModelEvaluationClassificationOp
component. For example:ModelEvaluationClassificationOp(slicing_specs=slicing_specs)
Format fairness metrics configuration slices
To evaluate your model for fairness, use the following components:
DetectDataBiasOp
: returns data bias metrics for the slices you pass into the component.DetectModelBiasOp
: returns model bias metrics for the slices you pass into the component.
To format your slices for the DetectDataBiasOp
and DetectModelBiasOp
components, do the following:
Create a
BiasConfig
. For example:from google.cloud.aiplatform_v1.types.ModelEvaluation import BiasConfig from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig bias_config = BiasConfig(bias_slices=SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) }))
Create a list to store the bias configurations. For example:
bias_configs = []
Format each
bias_config
as JSON or a dictionary. For example:bias_config_json = json_format.MessageToJson(bias_config) bias_config_dict = json_format.MessageToDict(bias_config)
Combine each
bias_config
into a list. For example:bias_configs.append(bias_config_json)
Pass
bias_configs
as a parameter to theDetectDataBiasOp
orDetectModelBiasOp
component. For example:DetectDataBiasOp(bias_configs=bias_configs) DetectModelBiasOp(bias_configs=bias_configs)
Required input parameters
The required input parameters for model evaluation components vary based on the type of model you are evaluating.
AutoML models
When training AutoML models, Vertex AI uses default schemas. Some input parameters for the model evaluation components are only required if you used a non-default schema to train your model.
You can view and download schema files from the following Cloud Storage
location:
gs://google-cloud-aiplatform/schema/modelevaluation/
The following tabs show example batch prediction output files and required input parameters for each model type:
Tabular
Classification
The default schema includes the following fields:
scores
: arrayclasses
: array
The following is an example of a batch prediction output file with
the target column of male
:
{ "instance": { "male": "1", "age": "49", "heartRate": "67", "education": "medium_high", "totChol": "282", "diaBP": "79", "BPMeds": "0", "currentSmoker": "current_high", "cigsPerDay": "25", "prevalentStroke": "stroke_0", "prevalentHyp": "hyp_0", "TenYearCHD": "FALSE", "diabetes": "0", "sysBP": "134", "BMI": "26.87", "glucose": "78" }, "prediction": { "scores": [0.3666940927505493, 0.6333059072494507], "classes": ["0", "1"] } }
This table describes the required parameters for the
ModelEvaluationClassificationOp
component with a
tabular model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Not required for default schema. Use prediction.classes for non-default schemas. |
prediction_score_column |
Not required for default schema. Use prediction.scores for non-default schemas. |
evaluation_class_labels |
Not required for default schema. Use ["0","1"] for non-default schemas. |
Regression
The default schema includes the following fields:
value
: floatlower_bound
: floatupper_bound
: float
The following is an example of a batch prediction output file with
the target column of age
:
{ "instance": { "BPMeds": "0", "diabetes": "0", "diaBP": "65", "cigsPerDay": "45", "prevalentHyp": "hyp_0", "age": "46", "male": "1", "sysBP": "112.5", "glucose": "78", "BMI": "27.48", "TenYearCHD": "FALSE", "totChol": "209", "education": "high", "prevalentStroke": "stroke_0", "heartRate": "75", "currentSmoker": "current_high" }, "prediction": { "value": 44.96103286743164, "lower_bound": 44.61349868774414, "upper_bound": 44.590206146240234 } }
This table describes the required parameters for the
ModelEvaluationRegressionOp
component with a
tabular model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Not required for regression models. |
prediction_score_column |
Not required for default schema. Use prediction.value for non-default schemas. |
evaluation_class_labels |
Not required for regression models. |
Forecasting
For tabular forecasting models, the BatchPredictionJob
component requires you to exclude the target column (ground truth)
from your dataset.
The default schema includes the following fields:
id
: stringdisplayNames
: stringtype
: enumtimeSegmentStart
: stringtimeSegmentEnd
: stringconfidence
: float
The following is an example CSV file with
sale_dollars
as the target column:
date,store_name,city,zip_code,county,sale_dollars 2020-03-17,Thriftway,,,,774.08999999999992 2020-03-10,Thriftway,,,,1160.67 2020-03-03,Thriftway,,,,2247.24 2020-06-08,New Star / Fort Dodge,,,,753.98 2020-06-01,New Star / Fort Dodge,,,,967.73 2020-01-10,Casey's General Store #1280 / Fort Dodge,,,,1040.92 2020-10-30,KUM & GO #76 / ADAIR,Adair,50002,ADAIR,1387.02
The TargetFieldDataRemoverOp
component removes the target
column before sending the file to the BatchPredictionJob
component:
date,store_name,city,zip_code,county 2020-03-17,Thriftway,nan,nan,nan 2020-03-10,Thriftway,nan,nan,nan 2020-03-03,Thriftway,nan,nan,nan 2020-06-08,New Star / Fort Dodge,nan,nan,nan 2020-06-01,New Star / Fort Dodge,nan,nan,nan 2020-01-10,Casey's General Store #1280 / Fort Dodge,nan,nan,nan 2020-10-30,KUM & GO #76 / ADAIR,Adair,50002.0,ADAIR
The following is an example of a batch prediction output file without
the target column of sale_dollars
:
{ "instance": { "content": "gs://kbn-us-central1-test/datasets/text/happy_11556.txt", "mimeType":"text/plain" }, "prediction": { "ids": ["7903916851837534208","3292230833410146304","986387824196452352","2139309328803299328","5598073842623840256","6750995347230687232","4559431178561519616"], "displayNames": ["affection","bonding","achievement","exercise","nature","enjoy_the_moment","leisure"], "confidences": [0.99483216,0.005162797,4.1117933E-6,3.9997E-7,2.4624453E-7,1.9969502E-7,1.16997434E-7] } }
This table describes the required parameters for the
ModelEvaluationForecastingOp
component with a
tabular model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Not required for forecasting models. |
prediction_score_column |
Not required for default schema. Use "prediction.value" for non-default schemas. |
evaluation_class_labels |
Not required for forecasting models. |
ground_truth_gcs_source |
Yes: the Cloud Storage link to the original dataset with the target column. |
ground_truth_format |
Yes: the format of the original dataset. For example,
"jsonl" , "csv" , or "bigquery" . |
Text
Classification
For text classification models, the BatchPredictionJob
component requires you to exclude the target column (ground truth)
from your dataset. You can use the TargetFieldDataRemoverOp
component for a more efficient workflow.
The default schema includes the following fields:
ids
: array of stringsdisplayName
: array of stringsconfidence
: array of floats
The following is an example dataset with
label
as the target column:
{ "content": "gs://kbn-us-central1-test/datasets/text/happy_0.txt", "mimeType": "text/plain", "label": "affection" }
The TargetFieldDataRemoverOp
component removes the target
column before sending the file to the BatchPredictionJob
component:
{ "content": "gs://kbn-us-central1-test/datasets/text/happy_0.txt", "mimeType": "text/plain" }
The following is an example of a batch prediction output file without
the target column of label
:
{ "instance": { "content": "gs://kbn-us-central1-test/datasets/text/happy_11556.txt", "mimeType":"text/plain" }, "prediction": { "ids": ["7903916851837534208","3292230833410146304","986387824196452352","2139309328803299328","5598073842623840256","6750995347230687232","4559431178561519616"], "displayNames": ["affection","bonding","achievement","exercise","nature","enjoy_the_moment","leisure"], "confidences": [0.99483216,0.005162797,4.1117933E-6,3.9997E-7,2.4624453E-7,1.9969502E-7,1.16997434E-7] } }
This table describes the required parameters for the
ModelEvaluationClassificationOp
component with a
text model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Not required for default schema. Use "prediction.displayNames" for non-default schemas. |
prediction_score_column |
Not required for default schema. Use "prediction.confidences" for non-default schemas. |
evaluation_class_labels |
Not required for default schema, or when
prediction_label_column is provided. |
ground_truth_gcs_source |
Yes: the Cloud Storage link to the original dataset with the target column. |
ground_truth_format |
Yes: the format of the original dataset. For example,
"jsonl" , "csv" , or "bigquery" . |
Video
Classification
For video classification models, the BatchPredictionJob
component requires you to exclude the target column (ground truth)
from your dataset. You can use the TargetFieldDataRemoverOp
component for a more efficient workflow.
The default schema includes the following fields:
id
: stringdisplayName
: stringtype
: enumtimeSegmentStart
: stringtimeSegmentEnd
: stringconfidence
: float
The following is an example dataset with
ground_truth
as the target column:
{ "content": "gs://automl-video-demo-data/hmdb51/Aerial_Cartwheel_Tutorial_By_Jujimufu_cartwheel_f_nm_np1_ri_med_0.avi", "ground_truth": "cartwheel", "timeSegmentStart": "0.0", "timeSegmentEnd": "inf" }
The TargetFieldDataRemoverOp
component removes the target
column before sending the file to the BatchPredictionJob
component:
{ "content": "gs://automl-video-demo-data/hmdb51/Aerial_Cartwheel_Tutorial_By_Jujimufu_cartwheel_f_nm_np1_ri_med_0.avi", "timeSegmentStart": "0.0", "timeSegmentEnd": "inf" }
The following is an example of a batch prediction output file without
the target column of label
:
{ "instance": { "content": "gs://automl-video-demo-data/hmdb51/20_Marine_Corps_Pull_Ups___JimmyDShea_pullup_f_cm_np1_ba_bad_3.avi" "mimeType": "video/avi", "timeSegmentStart": "0.0", "timeSegmentEnd": "inf" }, "prediction": [ { "id":"5015420640832782336", "displayName":"pullup", "type":"segment-classification", "timeSegmentStart":"0s", "timeSegmentEnd":"2.600s", "confidence":0.98152995 }, { "id":"2709577631619088384", "displayName":"cartwheel", "type":"segment-classification", "timeSegmentStart":"0s", "timeSegmentEnd":"2.600s", "confidence":0.0047166348 }, { "id":"403734622405394432", "displayName":"golf", "type":"segment-classification", "timeSegmentStart":"0s", "timeSegmentEnd":"2.600s", "confidence":0.0046260506 }, { "id":"1556656127012241408", "displayName":"ride_horse", "type":"segment-classification", "timeSegmentStart":"0s", "timeSegmentEnd":"2.600s", "confidence":0.004590442 }, { "id":"7321263650046476288", "displayName":"kick_ball", "type":"segment-classification", "timeSegmentStart":"0s", "timeSegmentEnd":"2.600s", "confidence":0.004536863 } ] }
This table describes the required parameters for the
ModelEvaluationClassificationOp
component with a
video model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Not required for default schema. Use "prediction.displayName" for non-default schemas. |
prediction_score_column |
Not required for default schema. Use "prediction.confidence" for non-default schemas. |
evaluation_class_labels |
Not required when a default schema is used, or when
prediction_label_column is provided. |
ground_truth_gcs_source |
Yes: the Cloud Storage link to the original dataset with the target column. |
ground_truth_format |
Yes: the format of the original dataset. For example,
"jsonl" , "csv" , or "bigquery" . |
Custom-trained models
For custom-trained models, the BatchPredictionJob
component
requires you to exclude the target column (ground truth) from your dataset. You
can use the TargetFieldDataRemoverOp
component for a more
efficient workflow.
The following example shows a batch prediction output file and required input parameters for a custom-trained tabular classification model:
Tabular
Classification
The typical schema includes the following fields:
instance
: array of input data in the same order as the model training dataset columns.prediction
: array of input data in the same order as the model training dataset columns.
The following is an example CSV dataset with
species
as the target column:
petal_length,petal_width,sepal_length,sepal_width,species 6.4,2.8,5.6,2.2,2 5.0,2.3,3.3,1.0,1 4.9,2.5,4.5,1.7,2 4.9,3.1,1.5,0.1,0
The TargetFieldDataRemoverOp
component removes the target
column before sending the file to the batch prediction component:
petal_length,petal_width,sepal_length,sepal_width 6.4,2.8,5.6,2.2 5.0,2.3,3.3,1.0 4.9,2.5,4.5,1.7 4.9,3.1,1.5,0.1
The following is an example of a batch prediction output file without
the target column of species
:
{ "instance": [5.6, 2.5, 3.9, 1.1], "prediction": [0.0005816521588712931, 0.9989032745361328, 0.0005150380893610418] } { "instance": [5.0, 3.6, 1.4, 0.2], "prediction": [0.999255359172821, 0.000527293945197016, 0.00021737271163146943] } { "instance": [6.0, 2.9, 4.5, 1.5], "prediction": [0.00025063654175028205, 0.9994204044342041, 0.00032893591560423374] }
This table describes the required parameters for the
ModelEvaluationClassificationOp
component with a
video model:
Evaluation component parameter | Required |
---|---|
target_field_name |
Yes |
prediction_label_column |
Must be empty; this column doesn't exist in custom-trained models. |
prediction_score_column |
Not required for default schema; is set automatically for JSONL , CSV , or BIGQUERY formats. |
evaluation_class_labels |
Yes. Must be the same length as the output prediction array. |
ground_truth_gcs_source |
Yes: the Cloud Storage link to the original dataset with the target column. |
ground_truth_format |
Yes: the format of the original dataset. For example,
"jsonl" , "csv" , or "bigquery" . |
Model evaluations without Vertex AI-generated batch predictions
You can use a model evaluation pipeline component with a batch prediction that you didn't generate in Vertex AI. However, note that the input for the model evaluation pipeline component must be a batch prediction directory containing files that match the following prefixes:
prediction.results-
explanation.results-
predictions_
API reference
For component reference, see the Google Cloud Pipeline Components SDK reference page for Model evaluation components.
For examples of how to use the Model Evaluation components, see the notebooks listed at Create an evaluation.
Version history and release notes
To learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see the Google Cloud Pipeline Components SDK Release Notes.