K-map is very similar to k-anonymity, except that it assumes that the attacker most likely doesn't know who is in the dataset. Use k-map if your dataset is relatively small, or if the level of effort involved in generalizing attributes would be too high.
Just like k-anonymity, k-map requires you to determine which columns of your database are quasi-identifiers. In doing this, you are stating what data an attacker will most likely use to re-identify subjects. In addition, computing a k-map value requires a re-identification dataset: a larger table with which to compare rows in the original dataset.
This topic demonstrates how to compute k-map values for a dataset using Sensitive Data Protection. For more information about k-map or risk analysis in general, see the risk analysis concept topic before continuing on.
Before you begin
Before continuing, be sure you've done the following:
- Sign in to your Google Account.
- In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Go to the project selector
- Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.
- Enable Sensitive Data Protection. Enable Sensitive Data Protection
- Select a BigQuery dataset to analyze. Sensitive Data Protection estimates the k-map metric by scanning a BigQuery table.
- Determine the types of datasets you want to use to model the attack
dataset. For more information, see the reference page for the
KMapEstimationConfig
object, as well as Risk analysis terms and techniques.
Compute k-map estimates
You can estimate k-map values using Sensitive Data Protection, which uses
a statistical model to estimate a re-identification dataset. This is in contrast
to the other risk analysis methods, in which the attack dataset is explicitly
known. Depending on the type of data, Sensitive Data Protection uses publicly
available datasets (for example, from the US Census) or a custom statistical
model (for example, one or more BigQuery tables that you specify), or
it extrapolates from the distribution of values in your input dataset. For more
information, see the reference page for the
KMapEstimationConfig
object.
To compute a k-map estimate using Sensitive Data Protection, first configure the
risk job. Compose a request to the
projects.dlpJobs
resource, where PROJECT_ID indicates your project
identifier:
https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs
The request contains a
RiskAnalysisJobConfig
object, which is composed of the following:
A
PrivacyMetric
object. This is where you specify that you want to calculate k-map by specifying aKMapEstimationConfig
object containing the following:quasiIds[]
: Required. Fields (TaggedField
objects) considered to be quasi-identifiers to scan and use to compute k-map. No two columns can have the same tag. These can be any of the following:- An infoType: This causes Sensitive Data Protection to use the relevant public dataset as a statistical model of population, including US ZIP codes, region codes, ages, and genders.
- A custom infoType: A custom tag wherein you indicate an
auxiliary table (an
AuxiliaryTable
object) that contains statistical information about the possible values of this column. - The
inferred
tag: If no semantic tag is indicated, specifyinferred
. Sensitive Data Protection infers the statistical model from the distribution of values in the input data.
regionCode
: An ISO 3166-1 alpha-2 region code for Sensitive Data Protection to use in statistical modeling. This value is required if no column is tagged with a region-specific infoType (for example, a US ZIP code) or a region code.auxiliaryTables[]
: Auxiliary tables (AuxiliaryTable
objects) to use in the analysis. Each custom tag used to tag a quasi-identifier column (fromquasiIds[]
) must appear in exactly one column of one auxiliary table.
A
BigQueryTable
object. Specify the BigQuery table to scan by including all of the following:projectId
: The project ID of the project containing the table.datasetId
: The dataset ID of the table.tableId
: The name of the table.
A set of one or more
Action
objects, which represent actions to run, in the order given, at the completion of the job. EachAction
object can contain one of the following actions:SaveFindings
object: Saves the results of the risk analysis scan to a BigQuery table.PublishToPubSub
object: Publishes a notification to a Pub/Sub topic.
PublishSummaryToCscc
object: Saves a results summary to Security Command Center.PublishFindingsToCloudDataCatalog
object: Saves results to Data Catalog.JobNotificationEmails
object: Sends you an email with results.PublishToStackdriver
object: Saves results to Google Cloud Observability.
Code examples
Following is sample code in several languages that demonstrates how to use Sensitive Data Protection to compute a k-map value.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Viewing k-map job results
To retrieve the results of the k-map risk analysis job using the REST API,
send the following GET request to the
projects.dlpJobs
resource. Replace PROJECT_ID with your project ID and
JOB_ID with the identifier of the job you want to obtain results for.
The job ID was returned when you started the job, and can also be retrieved by
listing all jobs.
GET https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_ID
The request returns a JSON object containing an instance of the job. The results
of the analysis are inside the "riskDetails"
key, in an
AnalyzeDataSourceRiskDetails
object. For more information, see the API reference for the
DlpJob
resource.
What's next
- Learn how to calculate the k-anonymity value for a dataset.
- Learn how to calculate the l-diversity value for a dataset.
- Learn how to calculate the δ-presence value for a dataset.