Class PrivacyMetric (0.14.0)

Privacy metric to compute for reidentification risk analysis.

Numerical stats

K-anonymity

k-map

Classes

CategoricalStatsConfig

Compute numerical stats over an individual column, including number of distinct values and value count distribution.

DeltaPresenceEstimationConfig

δ-presence metric, used to estimate how likely it is for an attacker to figure out that one given individual appears in a de-identified dataset. Similarly to the k-map metric, we cannot compute δ-presence exactly without knowing the attack dataset, so we use a statistical model instead.

ISO 3166-1 alpha-2 region code to use in the statistical modeling. Set if no column is tagged with a region-specific InfoType (like US_ZIP_5) or a region code.

KAnonymityConfig

k-anonymity metric, used for analysis of reidentification risk.

Message indicating that multiple rows might be associated to a single individual. If the same entity_id is associated to multiple quasi-identifier tuples over distinct rows, we consider the entire collection of tuples as the composite quasi-identifier. This collection is a multiset: the order in which the different tuples appear in the dataset is ignored, but their frequency is taken into account. Important note: a maximum of 1000 rows can be associated to a single entity ID. If more rows are associated with the same entity ID, some might be ignored.

KMapEstimationConfig

Reidentifiability metric. This corresponds to a risk model similar to what is called "journalist risk" in the literature, except the attack dataset is statistically modeled instead of being perfectly known. This can be done using publicly available data (like the US Census), or using a custom statistical model (indicated as one or several BigQuery tables), or by extrapolating from the distribution of values in the input dataset.

ISO 3166-1 alpha-2 region code to use in the statistical modeling. Set if no column is tagged with a region-specific InfoType (like US_ZIP_5) or a region code.

LDiversityConfig

l-diversity metric, used for analysis of reidentification risk.

Sensitive field for computing the l-value.

NumericalStatsConfig

Compute numerical stats over an individual column, including min, max, and quantiles.