Stay organized with collections
Save and categorize content based on your preferences.
Reconciliation (or clustering) confidence score is a metric for the confidence level of the assignment of an entity to a cluster. You can then filter out predictions that the clustering model is uncertain about and make decisions based on the remaining, confident outcomes.
How a confidence score is produced
Clustering produces hard assignments: each entity is assigned to exactly one cluster. The confidence score describes the confidence level that a node belongs to its assigned cluster, valued between [0, 1].
1.0 = very certain the entity belongs to its assigned cluster
0.0 = very uncertain the entity belongs to its assigned cluster
There is a notion of similarity/distance between any pair of entities. Entity pairs within a cluster are more likely to have lower distances than pairs that span different clusters. The further away an entity is from other members of its cluster, the lower the confidence value.
Other clusters also influence the confidence score. If there are other clusters close to an entity, its confidence is diminished according to the distances from those clusters.
The cluster density is related to the distances between all entity pairs of the cluster, and also has an effect on the confidence value: for any entity at a fixed distance from the cluster, the confidence value is high if the cluster density is low; and the confidence is low if the cluster density is high.
For the reconciliation pipeline to scale to millions or billions of entities, the confidence score calculation exploits randomized sampling methods to limit the computational complexity. As such, confidence scores are bucketed into 0.1-sized intervals. As a result, we recommend you do not depend on the exact confidence values to make review or human-in-the-loop decisions.
Diagram Key
Use the following descriptions to understand the diagrams.
Description
Diagram
Entity
A cluster of entities.
Entity cluster depicted by a circle. Cluster spread is represented by the size of the circle.
Multiple entity clusters. Color coded: an entity and its assigned cluster share the same color.
In some cases we focus on a single entity and its relation to other clusters. All other entities are hidden from view.
d_a: Distance from the entity to cluster A's centroid d_b: Distance from the entity to cluster B's centroid c: cluster confidence score of the entity
Illustrated examples
The follow diagrams serve as examples to help you visualize the high-level concept in determining confidence scores.
Situation
Diagram
The entity is assigned to cluster A. If A is the only cluster in the entire embedding space, then the confidence score will always be 1 regardless of the distance between them.
A and B are clusters that have the same spread, and their centroids are equally distant from the entity.
Both clusters have the same influence on the entity, so the confidence score is 0.5.
The presence of other clusters nearby will exert their influence on the entity and dilute the confidence score.
If there are three clusters of identical spread, and the entity is equally distant from all three, then the confidence score is 0.33.
A and B are clusters that have the same spread, but the entity is closer to A than it is to B.
A has a higher influence on the entity. Because the entity is also assigned to A, the confidence score will be larger than 0.5.
A and B are clusters that have the same spread, but the entity is closer to B than it is to A.
A's influence on the entity is thus lowered. The confidence score will be lower than 0.5.
A has a larger spread than B, but their centroids are equally distant from the entity.
A has a higher influence on the entity. Because the entity is also assigned to A, the confidence score will be larger than 0.5.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eReconciliation confidence score indicates how certain a clustering model is about assigning an entity to a specific cluster, with values ranging from 0.0 (very uncertain) to 1.0 (very certain).\u003c/p\u003e\n"],["\u003cp\u003eThe confidence score is influenced by the distance between an entity and other members of its assigned cluster, with closer distances generally leading to higher confidence.\u003c/p\u003e\n"],["\u003cp\u003eThe presence of nearby clusters and their distances to the entity will also impact the confidence score, reducing the score when other clusters are in close proximity.\u003c/p\u003e\n"],["\u003cp\u003eCluster density affects confidence, where a lower density results in a higher confidence score for an entity at a fixed distance, while a higher density results in a lower confidence score.\u003c/p\u003e\n"],["\u003cp\u003eConfidence scores are calculated using randomized sampling and are bucketed into 0.1-sized intervals, so exact confidence values should not be relied upon for critical decisions.\u003c/p\u003e\n"]]],[],null,["# Understand reconciliation confidence score\n\nReconciliation (or clustering) confidence score is a metric for the confidence level of the assignment of an entity to a cluster. You can then filter out predictions that the clustering model is uncertain about and make decisions based on the remaining, confident outcomes.\n\nHow a confidence score is produced\n----------------------------------\n\nClustering produces hard assignments: each entity is assigned to exactly one cluster. The confidence score describes the confidence level that a node belongs to its assigned cluster, valued between \\[0, 1\\].\n\n- 1.0 = very certain the entity belongs to its assigned cluster\n\n- 0.0 = very uncertain the entity belongs to its assigned cluster\n\nThere is a notion of similarity/distance between any pair of entities. Entity pairs within a cluster are more likely to have lower distances than pairs that span different clusters. The further away an entity is from other members of its cluster, the lower the confidence value.\n\nOther clusters also influence the confidence score. If there are other clusters close to an entity, its confidence is diminished according to the distances from those clusters.\n\nThe cluster density is related to the distances between all entity pairs of the cluster, and also has an effect on the confidence value: for any entity at a fixed distance from the cluster, the confidence value is high if the cluster density is low; and the confidence is low if the cluster density is high.\n\nFor the reconciliation pipeline to scale to millions or billions of entities, the confidence score calculation exploits randomized sampling methods to limit the computational complexity. As such, confidence scores are bucketed into 0.1-sized intervals. As a result, we recommend you do not depend on the exact confidence values to make review or human-in-the-loop decisions.\n\nDiagram Key\n-----------\n\nUse the following descriptions to understand the diagrams.\n\nIllustrated examples\n--------------------\n\nThe follow diagrams serve as examples to help you visualize the high-level concept in determining confidence scores."]]