Model evaluation components

Model evaluation components take ground truth and batch prediction results as input and generate evaluation metrics. Ground truth data is "correctly labeled" data as determined by humans, and is usually provided as the target field in the test dataset you used to train the model. For more information, see Model evaluation in Vertex AI.

Vertex AI provides the following model evaluation components:

Model type support

The following table shows supported model types for each model evaluation component:

Model evaluation component Supported model types
ModelEvaluationClassificationOp
  • AutoML tabular, image, text, video
  • Custom tabular
  • ModelEvaluationRegressionOp
  • AutoML tabular
  • Custom tabular
  • ModelEvaluationForecastingOp
  • AutoML tabular
  • Remove the target field

    For some model types, the BatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset.

    Configure and format slices for your tabular dataset

    A slice is a subset of tabular data with specified characteristics. Sliced metrics offers more granular evaluation metrics for AutoML tabular and custom training classification models.

    For example, consider a pet adoption dataset where a model is trained to predict whether a pet will be adopted within a week. While it's useful to look at metrics for the whole dataset, you may be interested in metrics related to the pet's species and age. In other words, you're interested in the following subset of the dataset:

    Feature Values
    age (in years) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
    species "dog", "cat", "rabbit"

    This fine-grained analysis can be helpful in identifying where the model might be most error prone, or ensuring that the model performs well on critical subsets of the data.

    Configure slices

    To configure a slice, create a JSON file with the following configuration:

    {
      "FEATURE": {
        "KIND": {
          CONFIG
        }
      }
    }
    

    Where:

    • FEATURE is a feature for which you want to create a slice. For example, age.

    • KIND is one of the following:

      • value

        value creates a single slice on a single value of a feature. Specify CONFIG as a key-value pair where the key is float_value or string_value. For example, "float_value": 1.0.

        The following example configuration creates a single slice containing data with the age of 1:

        {
          "age": {
            "value": {
              "float_value": 1.0
            }
          }
        }
      • range

        range creates a single slice that includes data within a specified range. Specify CONFIG as two key-value pairs where the keys are low and high.

        The following example configuration creates a single slice containing data where the age is any value between 1 and 3:

        {
          "age": {
            "range": {
              "low": 1,
              "high": 3
            }
          }
        }
      • all_values

        all_values creates a separate slice for each possible label for the feature. Specify CONFIG as "value": true.

        The following example configuration creates three slices—one each for "dog", "cat", and "rabbit":

        {
          "species": {
            "all_values": {
              "value": true
            }
          }
        }

    You can also use multiple features to create one or more slices.

    The following configuration creates a single slice containing data where the age is between 1 and 3 and the species is "dog":

    {
      "age": {
        "range": {
          "low": 1,
          "high": 3
        }
      },
      "species": {
        "value": {
          "string_value": "dog"
        }
      }
    }
    

    The following configuration creates multiple slices—one for each unique species in the dataset, where the age is 1.

    {
      "species": {
        "all_values": {
          "value": true
        }
      },
      "age": {
        "value": {
          "float_value": 1.0
        }
      }
    }
    

    The resulting slices contain the following data from the example dataset:

    • Slice 1: age:1 and species:"dog"

    • Slice 2: age:1 and species:"cat"

    • Slice 3: age:1 and species:"rabbit"

    Format slices

    To format slices for the ModelEvaluationClassificationOp component, do the following:

    1. Create a slicing_spec. For example:

      from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec
      from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig
      
      slicing_spec = SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) })
    2. Create a list to store the slice configurations. For example:

      slicing_specs = []
    3. Format each slicing_spec as JSON or a dictionary. For example:

      slicing_spec_json = json_format.MessageToJson(slicing_spec)
      
      slicing_spec_dict = json_format.MessageToDict(slicing_spec)
    4. Combine each slicing_spec into a list. For example:

      slicing_specs.append(slicing_spec_json)
    5. Pass slicing_specs as a parameter to the ModelEvaluationClassificationOp component. For example:

      ModelEvaluationClassificationOp(slicing_specs=slicing_specs)

    Format fairness metrics configuration slices

    To evaluate your model for fairness, use the following components:

    To format your slices for the DetectDataBiasOp and DetectModelBiasOp components, do the following:

    1. Create a BiasConfig. For example:

      from google.cloud.aiplatform_v1.types.ModelEvaluation import BiasConfig
      from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice import SliceSpec
      from google.cloud.aiplatform_v1.types.ModelEvaluationSlice.Slice.SliceSpec import SliceConfig
      
      bias_config = BiasConfig(bias_slices=SliceSpec(configs={ 'feature_a': SliceConfig(SliceSpec.Value(string_value='label_a') ) }))
    2. Create a list to store the bias configurations. For example:

      bias_configs = []
    3. Format each bias_config as JSON or a dictionary. For example:

      bias_config_json = json_format.MessageToJson(bias_config)
      
      bias_config_dict = json_format.MessageToDict(bias_config)
    4. Combine each bias_config into a list. For example:

      bias_configs.append(bias_config_json)
    5. Pass bias_configs as a parameter to the DetectDataBiasOp or DetectModelBiasOp component. For example:

      DetectDataBiasOp(bias_configs=bias_configs)
      
      DetectModelBiasOp(bias_configs=bias_configs)

    Required input parameters

    The required input parameters for model evaluation components vary based on the type of model you are evaluating.

    AutoML models

    When training AutoML models, Vertex AI uses default schemas. Some input parameters for the model evaluation components are only required if you used a non-default schema to train your model.

    You can view and download schema files from the following Cloud Storage location:
    gs://google-cloud-aiplatform/schema/modelevaluation/

    The following tabs show example batch prediction output files and required input parameters for each model type:

    Tabular

    Classification

    The default schema includes the following fields:

    • scores: array
    • classes: array

    The following is an example of a batch prediction output file with the target column of male:

    {
      "instance": {
        "male": "1",
        "age": "49",
        "heartRate": "67",
        "education": "medium_high",
        "totChol": "282",
        "diaBP": "79",
        "BPMeds": "0",
        "currentSmoker": "current_high",
        "cigsPerDay": "25",
        "prevalentStroke": "stroke_0",
        "prevalentHyp": "hyp_0",
        "TenYearCHD": "FALSE",
        "diabetes": "0",
        "sysBP": "134",
        "BMI": "26.87",
        "glucose": "78"
        },
      "prediction": {
          "scores": [0.3666940927505493, 0.6333059072494507],
          "classes": ["0", "1"]
        }
    }
              

    This table describes the required parameters for the ModelEvaluationClassificationOp component with a tabular model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Not required for default schema.
    Use prediction.classes for non-default schemas.
    prediction_score_column Not required for default schema.
    Use prediction.scores for non-default schemas.
    evaluation_class_labels Not required for default schema.
    Use ["0","1"] for non-default schemas.

    Regression

    The default schema includes the following fields:

    • value: float
    • lower_bound: float
    • upper_bound: float

    The following is an example of a batch prediction output file with the target column of age:

    {
      "instance": {
        "BPMeds": "0",
        "diabetes": "0",
        "diaBP": "65",
        "cigsPerDay": "45",
        "prevalentHyp": "hyp_0",
        "age": "46",
        "male": "1",
        "sysBP": "112.5",
        "glucose": "78",
        "BMI": "27.48",
        "TenYearCHD": "FALSE",
        "totChol": "209",
        "education": "high",
        "prevalentStroke": "stroke_0",
        "heartRate": "75",
        "currentSmoker": "current_high"
      },
      "prediction": {
        "value": 44.96103286743164,
        "lower_bound": 44.61349868774414,
        "upper_bound": 44.590206146240234
      }
    }
              

    This table describes the required parameters for the ModelEvaluationRegressionOp component with a tabular model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Not required for regression models.
    prediction_score_column Not required for default schema.
    Use prediction.value for non-default schemas.
    evaluation_class_labels Not required for regression models.

    Forecasting

    For tabular forecasting models, the BatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset.

    The default schema includes the following fields:

    • id: string
    • displayNames: string
    • type: enum
    • timeSegmentStart: string
    • timeSegmentEnd: string
    • confidence: float

    The following is an example CSV file with sale_dollars as the target column:

    date,store_name,city,zip_code,county,sale_dollars
    2020-03-17,Thriftway,,,,774.08999999999992
    2020-03-10,Thriftway,,,,1160.67
    2020-03-03,Thriftway,,,,2247.24
    2020-06-08,New Star / Fort Dodge,,,,753.98
    2020-06-01,New Star / Fort Dodge,,,,967.73
    2020-01-10,Casey's General Store #1280 / Fort Dodge,,,,1040.92
    2020-10-30,KUM & GO #76 / ADAIR,Adair,50002,ADAIR,1387.02
              

    The TargetFieldDataRemoverOp component removes the target column before sending the file to the BatchPredictionJob component:

    date,store_name,city,zip_code,county
    2020-03-17,Thriftway,nan,nan,nan
    2020-03-10,Thriftway,nan,nan,nan
    2020-03-03,Thriftway,nan,nan,nan
    2020-06-08,New Star / Fort Dodge,nan,nan,nan
    2020-06-01,New Star / Fort Dodge,nan,nan,nan
    2020-01-10,Casey's General Store #1280 / Fort Dodge,nan,nan,nan
    2020-10-30,KUM & GO #76 / ADAIR,Adair,50002.0,ADAIR
              

    The following is an example of a batch prediction output file without the target column of sale_dollars:

    {
      "instance": {
        "content": "gs://kbn-us-central1-test/datasets/text/happy_11556.txt",
        "mimeType":"text/plain"
      },
      "prediction": {
        "ids":    ["7903916851837534208","3292230833410146304","986387824196452352","2139309328803299328","5598073842623840256","6750995347230687232","4559431178561519616"],
        "displayNames": ["affection","bonding","achievement","exercise","nature","enjoy_the_moment","leisure"],
        "confidences": [0.99483216,0.005162797,4.1117933E-6,3.9997E-7,2.4624453E-7,1.9969502E-7,1.16997434E-7]
      }
    }
              

    This table describes the required parameters for the ModelEvaluationForecastingOp component with a tabular model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Not required for forecasting models.
    prediction_score_column Not required for default schema.
    Use "prediction.value" for non-default schemas.
    evaluation_class_labels Not required for forecasting models.
    ground_truth_gcs_source Yes: the Cloud Storage link to the original dataset with the target column.
    ground_truth_format Yes: the format of the original dataset. For example, "jsonl", "csv", or "bigquery".

    Text

    Classification

    For text classification models, the BatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset. You can use the TargetFieldDataRemoverOp component for a more efficient workflow.

    The default schema includes the following fields:

    • ids: array of strings
    • displayName: array of strings
    • confidence: array of floats

    The following is an example dataset with label as the target column:

    {
      "content": "gs://kbn-us-central1-test/datasets/text/happy_0.txt",
      "mimeType": "text/plain",
      "label": "affection"
    }
              

    The TargetFieldDataRemoverOp component removes the target column before sending the file to the BatchPredictionJob component:

    {
      "content": "gs://kbn-us-central1-test/datasets/text/happy_0.txt",
      "mimeType": "text/plain"
    }
              

    The following is an example of a batch prediction output file without the target column of label:

    {
      "instance": {
        "content": "gs://kbn-us-central1-test/datasets/text/happy_11556.txt",
        "mimeType":"text/plain"
      },
      "prediction": {
        "ids":    ["7903916851837534208","3292230833410146304","986387824196452352","2139309328803299328","5598073842623840256","6750995347230687232","4559431178561519616"],
        "displayNames": ["affection","bonding","achievement","exercise","nature","enjoy_the_moment","leisure"],
        "confidences": [0.99483216,0.005162797,4.1117933E-6,3.9997E-7,2.4624453E-7,1.9969502E-7,1.16997434E-7]
      }
    }
              

    This table describes the required parameters for the ModelEvaluationClassificationOp component with a text model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Not required for default schema.
    Use "prediction.displayNames" for non-default schemas.
    prediction_score_column Not required for default schema.
    Use "prediction.confidences" for non-default schemas.
    evaluation_class_labels Not required for default schema, or when prediction_label_column is provided.
    ground_truth_gcs_source Yes: the Cloud Storage link to the original dataset with the target column.
    ground_truth_format Yes: the format of the original dataset. For example, "jsonl", "csv", or "bigquery".

    Video

    Classification

    For video classification models, the BatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset. You can use the TargetFieldDataRemoverOp component for a more efficient workflow.

    The default schema includes the following fields:

    • id: string
    • displayName: string
    • type: enum
    • timeSegmentStart: string
    • timeSegmentEnd: string
    • confidence: float

    The following is an example dataset with ground_truth as the target column:

    {
      "content": "gs://automl-video-demo-data/hmdb51/Aerial_Cartwheel_Tutorial_By_Jujimufu_cartwheel_f_nm_np1_ri_med_0.avi",
      "ground_truth": "cartwheel",
      "timeSegmentStart": "0.0",
      "timeSegmentEnd": "inf"
    }
              

    The TargetFieldDataRemoverOp component removes the target column before sending the file to the BatchPredictionJob component:

    {
      "content": "gs://automl-video-demo-data/hmdb51/Aerial_Cartwheel_Tutorial_By_Jujimufu_cartwheel_f_nm_np1_ri_med_0.avi",
      "timeSegmentStart": "0.0",
      "timeSegmentEnd": "inf"
    }
              

    The following is an example of a batch prediction output file without the target column of label:

    {
      "instance": {
        "content": "gs://automl-video-demo-data/hmdb51/20_Marine_Corps_Pull_Ups___JimmyDShea_pullup_f_cm_np1_ba_bad_3.avi"
        "mimeType": "video/avi",
        "timeSegmentStart": "0.0",
        "timeSegmentEnd": "inf"
      },
      "prediction": [
        {
          "id":"5015420640832782336",
          "displayName":"pullup",
          "type":"segment-classification",
          "timeSegmentStart":"0s",
          "timeSegmentEnd":"2.600s",
          "confidence":0.98152995
        },
        {
          "id":"2709577631619088384",
          "displayName":"cartwheel",
          "type":"segment-classification",
          "timeSegmentStart":"0s",
          "timeSegmentEnd":"2.600s",
          "confidence":0.0047166348
        },
        {
          "id":"403734622405394432",
          "displayName":"golf",
          "type":"segment-classification",
          "timeSegmentStart":"0s",
          "timeSegmentEnd":"2.600s",
          "confidence":0.0046260506
        },
        {
          "id":"1556656127012241408",
          "displayName":"ride_horse",
          "type":"segment-classification",
          "timeSegmentStart":"0s",
          "timeSegmentEnd":"2.600s",
          "confidence":0.004590442
        },
        {
          "id":"7321263650046476288",
          "displayName":"kick_ball",
          "type":"segment-classification",
          "timeSegmentStart":"0s",
          "timeSegmentEnd":"2.600s",
          "confidence":0.004536863
        }
      ]
    }
              

    This table describes the required parameters for the ModelEvaluationClassificationOp component with a video model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Not required for default schema.
    Use "prediction.displayName" for non-default schemas.
    prediction_score_column Not required for default schema.
    Use "prediction.confidence" for non-default schemas.
    evaluation_class_labels Not required when a default schema is used, or when prediction_label_column is provided.
    ground_truth_gcs_source Yes: the Cloud Storage link to the original dataset with the target column.
    ground_truth_format Yes: the format of the original dataset. For example, "jsonl", "csv", or "bigquery".

    Custom-trained models

    For custom-trained models, the BatchPredictionJob component requires you to exclude the target column (ground truth) from your dataset. You can use the TargetFieldDataRemoverOp component for a more efficient workflow.

    The following example shows a batch prediction output file and required input parameters for a custom-trained tabular classification model:

    Tabular

    Classification

    The typical schema includes the following fields:

    • instance: array of input data in the same order as the model training dataset columns.
    • prediction: array of input data in the same order as the model training dataset columns.

    The following is an example CSV dataset with species as the target column:

    petal_length,petal_width,sepal_length,sepal_width,species
    6.4,2.8,5.6,2.2,2
    5.0,2.3,3.3,1.0,1
    4.9,2.5,4.5,1.7,2
    4.9,3.1,1.5,0.1,0
              

    The TargetFieldDataRemoverOp component removes the target column before sending the file to the batch prediction component:

    petal_length,petal_width,sepal_length,sepal_width
    6.4,2.8,5.6,2.2
    5.0,2.3,3.3,1.0
    4.9,2.5,4.5,1.7
    4.9,3.1,1.5,0.1
              

    The following is an example of a batch prediction output file without the target column of species:

    {
      "instance": [5.6, 2.5, 3.9, 1.1],
      "prediction": [0.0005816521588712931, 0.9989032745361328, 0.0005150380893610418]
    }
    {
      "instance": [5.0, 3.6, 1.4, 0.2],
      "prediction": [0.999255359172821, 0.000527293945197016, 0.00021737271163146943]
    }
    {
      "instance": [6.0, 2.9, 4.5, 1.5],
      "prediction": [0.00025063654175028205, 0.9994204044342041, 0.00032893591560423374]
    }
              

    This table describes the required parameters for the ModelEvaluationClassificationOp component with a video model:

    Evaluation component parameter Required
    target_field_name Yes
    prediction_label_column Must be empty; this column doesn't exist in custom-trained models.
    prediction_score_column Not required for default schema; is set automatically for JSONL, CSV, or BIGQUERY formats.
    evaluation_class_labels Yes. Must be the same length as the output prediction array.
    ground_truth_gcs_source Yes: the Cloud Storage link to the original dataset with the target column.
    ground_truth_format Yes: the format of the original dataset. For example, "jsonl", "csv", or "bigquery".

    Model evaluations without Vertex AI-generated batch predictions

    You can use a model evaluation pipeline component with a batch prediction that you didn't generate in Vertex AI. However, note that the input for the model evaluation pipeline component must be a batch prediction directory containing files that match the following prefixes:

    • prediction.results-

    • explanation.results-

    • predictions_

    API reference

    Version history and release notes

    To learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see the Google Cloud Pipeline Components SDK Release Notes.