Computing δ-presence for a dataset

Delta-presence (δ-presence) is a metric that quantifies the probability that an individual belongs to an analyzed dataset. Like k-map, you can estimate δ-presence values using Sensitive Data Protection, which uses a statistical model to estimate the attack dataset.

δ-presence contrasts with the other risk analysis methods, in which the attack dataset is explicitly known. Depending on the type of data, Sensitive Data Protection uses publicly available datasets (for example, from the US Census) or a custom statistical model (for example, one or more BigQuery tables that you specify), or it extrapolates from the distribution of values in your input dataset.

This topic demonstrates how to compute δ-presence values for a dataset using Sensitive Data Protection. For more information about δ-presence or risk analysis in general, see the risk analysis concept topic before continuing on.

Before you begin

Before continuing, be sure you've done the following:

  1. Sign in to your Google Account.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
  3. Go to the project selector
  4. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.
  5. Enable Sensitive Data Protection.
  6. Enable Sensitive Data Protection

  7. Select a BigQuery dataset to analyze. Sensitive Data Protection estimates the δ-presence metric by scanning a BigQuery table.
  8. Determine the types of datasets you want to use to model the attack dataset. For more information, see the reference page for the DeltaPresenceEstimationConfig object, as well as Risk analysis terms and techniques.

Compute δ-presence metrics

To compute a δ-presence estimate using Sensitive Data Protection, send a request to the following URL, where PROJECT_ID indicates your project identifier:

https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs

The request contains a RiskAnalysisJobConfig object, which is composed of the following:

  • A PrivacyMetric object. This is where you specify that you want to calculate δ-presence by specifying a DeltaPresenceEstimationConfig object containing the following:

    • quasiIds[]: Required. Fields (QuasiId objects) considered to be quasi-identifiers to scan and use to compute δ-presence. No two columns can have the same tag. These can be any of the following:

      • An infoType: This causes Sensitive Data Protection to use the relevant public dataset as a statistical model of population, including US ZIP codes, region codes, ages, and genders.
      • A custom infoType: A custom tag wherein you indicate an auxiliary table (an AuxiliaryTable object) that contains statistical information about the possible values of this column.
      • The inferred tag: If no semantic tag is indicated, specify inferred. Sensitive Data Protection infers the statistical model from the distribution of values in the input data.
    • regionCode: An ISO 3166-1 alpha-2 region code for Sensitive Data Protection to use in statistical modeling. This value is required if no column is tagged with a region-specific infoType (for example, a US ZIP code) or a region code.

    • auxiliaryTables[]: Auxiliary tables (StatisticalTable objects) to use in the analysis. Each custom tag used to tag a quasi-identifier column (from quasiIds[]) must appear in exactly one column of one auxiliary table.

  • A BigQueryTable object. Specify the BigQuery table to scan by including all of the following:

    • projectId: The project ID of the project containing the table.
    • datasetId: The dataset ID of the table.
    • tableId: The name of the table.
  • A set of one or more Action objects, which represent actions to run, in the order given, at the completion of the job. Each Action object can contain one of the following actions:

Viewing δ-presence job results

To retrieve the results of the δ-presence risk analysis job using the REST API, send the following GET request to the projects.dlpJobs resource. Replace PROJECT_ID with your project ID and JOB_ID with the identifier of the job you want to obtain results for. The job ID was returned when you started the job, and can also be retrieved by listing all jobs.

GET https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_ID

The request returns a JSON object containing an instance of the job. The results of the analysis are inside the "riskDetails" key, in an AnalyzeDataSourceRiskDetails object. For more information, see the API reference for the DlpJob resource.

What's next

  • Learn how to calculate the k-anonymity value for a dataset.
  • Learn how to calculate the l-diversity value for a dataset.
  • Learn how to calculate the k-map value for a dataset.