Class PrivacyMetric (2.0.2)

PrivacyMetric(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Privacy metric to compute for reidentification risk analysis.

Attributes

NameDescription
numerical_stats_config .dlp.PrivacyMetric.NumericalStatsConfig
Numerical stats
categorical_stats_config .dlp.PrivacyMetric.CategoricalStatsConfig
Categorical stats
k_anonymity_config .dlp.PrivacyMetric.KAnonymityConfig
K-anonymity
l_diversity_config .dlp.PrivacyMetric.LDiversityConfig
l-diversity
k_map_estimation_config .dlp.PrivacyMetric.KMapEstimationConfig
k-map
delta_presence_estimation_config .dlp.PrivacyMetric.DeltaPresenceEstimationConfig
delta-presence

Classes

CategoricalStatsConfig

CategoricalStatsConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Compute numerical stats over an individual column, including number of distinct values and value count distribution.

DeltaPresenceEstimationConfig

DeltaPresenceEstimationConfig(
    mapping=None, *, ignore_unknown_fields=False, **kwargs
)

δ-presence metric, used to estimate how likely it is for an attacker to figure out that one given individual appears in a de-identified dataset. Similarly to the k-map metric, we cannot compute δ-presence exactly without knowing the attack dataset, so we use a statistical model instead.

KAnonymityConfig

KAnonymityConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

k-anonymity metric, used for analysis of reidentification risk.

KMapEstimationConfig

KMapEstimationConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Reidentifiability metric. This corresponds to a risk model similar to what is called "journalist risk" in the literature, except the attack dataset is statistically modeled instead of being perfectly known. This can be done using publicly available data (like the US Census), or using a custom statistical model (indicated as one or several BigQuery tables), or by extrapolating from the distribution of values in the input dataset.

LDiversityConfig

LDiversityConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

l-diversity metric, used for analysis of reidentification risk.

NumericalStatsConfig

NumericalStatsConfig(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Compute numerical stats over an individual column, including min, max, and quantiles.