[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Introduction to model evaluation for fairness\n\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nA machine learning workflow can include evaluating your model for fairness. An\nunfair model displays systemic bias that can cause harm, especially to\ntraditionally underrepresented groups. An unfair model may perform\nworse for certain subsets, or *slices*, of the dataset.\n\nYou can detect bias during the data collection or post-training evaluation\nprocess. Vertex AI provides the following model evaluation metrics to\nhelp you evaluate your model for bias:\n\n- [**Data bias metrics**](/vertex-ai/docs/evaluation/data-bias-metrics): Before you train and build your model,\n these metrics detect whether your raw data includes biases. For example, a\n smile-detection dataset may contain far fewer elderly people than younger\n ones. Several of these metrics are based on quantifying the distance between\n label distribution for different groups of data:\n\n - Difference in Population Size.\n\n - Difference in Positive Proportions in True Labels.\n\n- [**Model bias metrics**](/vertex-ai/docs/evaluation/model-bias-metrics): After you train your model, these metrics\n detect whether your model's predictions include biases. For example, a model\n may be more accurate for one subset of the data than the rest of the data:\n\n - Accuracy Difference.\n\n - Difference in Positive Proportions in Predicted Labels.\n\n - Recall Difference.\n\n - Specificity Difference.\n\n - Difference in Ratio of Error Types.\n\nTo learn how to include the model evaluation bias pipeline components in your\npipeline run, see [Model evaluation component](/vertex-ai/docs/pipelines/model-evaluation-component#fairness).\n\nExample dataset overview\n------------------------\n\nFor all examples related to fairness metrics, we use a hypothetical college\nadmission dataset with features such as an applicant's high school grades,\nstate, and gender identity. We want to measure whether the college is biased\ntowards California or Florida applicants.\n\nThe target labels, or all possible outcomes, are:\n\n- Accept the applicant with scholarship (`p`).\n\n- Accept the applicant without a scholarship (`q`)\n\n- Reject the applicant (`r`).\n\nWe can assume that admission experts provided these labels as the ground truth.\nNote that it's possible for even these expert labels to be biased, since they\nwere assigned by humans.\n\nTo create a binary classification example, we can group labels together to\ncreate two possible outcomes:\n\n- Positive outcome, notated as `1`. We can group `p` and `q` into the positive\n outcome of \"accepted `{p,q}`.\"\n\n- Negative outcome, notated as `0`. This can be a collection\n of every other outcome aside from the positive outcome. In our college\n application example, the negative outcome is \"rejected `{r}`.\"\n\nTo measure bias between California and Florida applicants, we separate out two\nslices from the rest of the dataset:\n\n- Slice 1 of the dataset for which the bias is being measured. In the\n college application example, we're measuring bias for applicants from\n California.\n\n- Slice 2 of the dataset against which the bias is being measured. Slice 2\n can include \"everything not in slice 1\" by default, but for the college\n application example, we are assigning slice 2 as Florida applicants.\n\nIn our example college application dataset, we have 200 applicants from\nCalifornia in slice 1, and 100 Florida applicants in slice 2. After training the\nmodel, we have the following confusion matrices:\n\nBy comparing metrics between the two confusion matrices, we can measure biases\nby answering questions such as \"does the model have better recall for one slice\nthan the other?\"\n\nWe also use the following shorthand to represent labeled ground truth data,\nwhere `i` represents the slice number (1 or 2):\n\n\u003cbr /\u003e\n\n\\\\( l\\^0_i = tn_i + fp_i \\\\)\nFor slice i, number of labeled negative outcomes = true negatives + false positives.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\\\\( l\\^1_i = fn_i + tp_i \\\\)\nFor slice `i`, number of labeled positive outcomes = false negatives + true positives.\n\n\u003cbr /\u003e\n\nNote the following about the college application dataset example:\n\n- Some fairness metrics can be generalized for multiple outcomes as well, but we\n use binary classification for simplicity.\n\n- The example focuses on the classification task, but some fairness metrics\n generalize to other problems like regression.\n\n- For this example, we assume that the training data and test data are the same.\n\nWhat's next\n-----------\n\n- Learn about the [data bias metrics](/vertex-ai/docs/evaluation/data-bias-metrics) supported by Vertex AI.\n\n- Learn about the [model bias metrics](/vertex-ai/docs/evaluation/model-bias-metrics) supported by Vertex AI.\n\n- Read the [model evaluation pipeline component reference](/vertex-ai/docs/pipelines/model-evaluation-component#fairness)."]]