[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Data bias metrics for Vertex AI\n\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis page describes evaluation metrics you can use to detect *data bias* ,\nwhich can appear in raw data and ground truth values even before you train the\nmodel. For the examples and notation on this page, we use a hypothetical college\napplication dataset that we describe in detail in [Introduction to model\nevaluation for fairness](/vertex-ai/docs/evaluation/intro-evaluation-fairness).\n\nFor descriptions of metrics that are generated from post-training data, see\n[Model bias metrics](/vertex-ai/docs/evaluation/model-bias-metrics).\n\nOverview\n--------\n\nIn our example college application dataset, we have 200 applicants from\nCalifornia in slice 1, and 100 Florida applicants in slice 2, labeled as\nfollows:\n\nYou can generally interpret the sign for most metrics as follows:\n\n- Positive value: indicates a potential bias favoring slice 1 over slice 2.\n\n- Zero value: indicates no bias in between slice 1 and slice 2.\n\n- Negative value: indicates a potential bias in favoring slice 2 over slice 1.\n\nWe make a note where this doesn't apply to a metric.\n\nDifference in Population Size\n-----------------------------\n\n*Difference in Population Size* measures whether there are more examples in slice 1\nversus slice 2, normalized by total population of the two slices: \n$$ \\\\frac{n_1-n_2}{n_1+n_2} $$\n\n(total population of slice 1 - total population of slice 2) /\n(sum of populations in slice 1 and 2)\n\n**In our example dataset**:\n\n(200 California applicants - 100 Florida applicants)/ 300 total applicants = 100/300 = 0.33.\n\nThe positive value of the Difference in Population Size indicates that there are\ndisproportionately more California applicants than Florida applicants. The\npositive value may or may not indicate bias by itself, but when a model is\ntrained on this data, the model might learn to perform better for California\napplicants.\n\nDifference in Positive Proportions in True Labels (DPPTL)\n---------------------------------------------------------\n\nThe *Difference in Positive Proportions in True Labels* measures whether a dataset\nhas disproportionately more positive ground truth labels for one slice over the\nother. This metric calculates the difference in Positive Proportions in True\nLabels between slice 1 and slice 2, where Positive Proportions in True Labels\nfor a slice is (Labeled positive outcomes / Total population size). This\nmetric is also known as *Label Imbalance*:\n**Note:** This metric is analogous to the [model bias metric](/vertex-ai/docs/evaluation/model-bias-metrics) of *Difference in Positive Proportions in Predicted Labels*, which focuses on predicted positive outcomes instead of labeled positive outcomes. \n$$ \\\\frac{l\\^1_1}{n_1} - \\\\frac{l\\^1_2}{n_2} $$\n\n(Labeled positive outcomes for slice 1/Total population size of slice 1) -\n(Labeled positive outcomes for slice 2/Total population size of slice 2)\n\n**In our example dataset**:\n\n(60 accepted California applicants/200 California applicants) - (20 accepted\nFlorida applicants/100 Florida applicants) = 60/200 - 20/100 = 0.1.\n\nThe positive value of the DPPTL indicates that the dataset has\ndisproportionately higher positive outcomes for California applicants compared\nto Florida applicants. The positive value may or may not indicate bias by\nitself, but when a model is trained on this data, the model might learn to\npredict disproportionately more positive outcomes for California applicants.\n\nWhat's next\n-----------\n\n- Learn about the [model bias metrics](/vertex-ai/docs/evaluation/model-bias-metrics) supported by Vertex AI.\n\n- Read the [model evaluation pipeline component reference](/vertex-ai/docs/pipelines/model-evaluation-component#fairness)."]]