This page contains code samples for Cloud Data Loss Prevention. To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.
Compute l-diversity with Cloud DLP. L-diversity, which is an extension of k-anonymity, measures the diversity of sensitive values for each column in which they occur. A dataset has l-diversity if, for every set of rows with identical quasi-identifiers, there are at least l distinct values for each sensitive attribute.
View in documentation
Compute numerical statistics
You can determine minimum, maximum, and quantile values for an individual BigQuery column. To calculate these values, you configure a DlpJob, setting the NumericalStatsConfig privacy metric to the name of the column to scan. When you run the job, Cloud DLP computes statistics for the given column, returning its results in the NumericalStatsResult object.
View in documentation
K-anonymity is a property of a dataset that indicates the re-identifiability of its records. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset. This sample demonstrates how to use Cloud DLP to compute a k-anonymity value.
View in documentation
Create an inspection template
Use templates to create and persist configuration information for use with Cloud DLP. Templates are useful for decoupling configuration information—such as what you inspect for and how you de-identify it—from the implementation of your requests. Templates provide a way to re-use configuration and enable consistency across users and datasets. In addition, whenever you update a template, it's updated for any job trigger that uses it.
View in documentation
De-identify content through deterministic encryption
Use the Data Loss Prevention API to de-identify sensitive data in a string using deterministic encryption, which is a reversible cryptographic method. The encryption is performed with a wrapped key.
De-identify free text with FPE by using a surrogate
Uses the Data Loss Prevention API to de-identify sensitive data in a string using format-preserving encryption (FPE). The encryption is performed with an unwrapped key.
De-identify table data with format-preserving encryption
Demonstrates encrypting sensitive data in a table while maintaining format.
De-identify table data: Suppress a row based on the content of a column
Suppress a row based on the content of a column. You can remove a row entirely based on the content that appears in any column. This example suppresses the record for "Charles Dickens," as this patient is over 89 years old.
Inspect a string for sensitive data by using multiple rules
Illustrates applying both exclusion and hotword rules. This snippet's rule set includes both hotword rules and dictionary and regex exclusion rules. Notice that the four rules are specified in an array within the rules element.
Inspect a string for sensitive data, excluding a custom substring
Illustrates how to use an InspectConfig to instruct Cloud DLP to avoid matching on the name "Jimmy" in a scan that uses the specified custom regular expression detector.
Inspect a string for sensitive data, omitting custom matches
Omit scan matches from a PERSON_NAME detector scan that overlap with a custom detector.
Inspect a string for sensitive data, omitting overlapping matches on domain and email
Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan.
Inspect a string for sensitive data, omitting overlapping matches on person and email
Omit matches on a PERSON_NAME detector if also matched by an EMAIL_ADDRESS detector.
Inspect a string for sensitive data, using exclusion dictionary
Omit a specific email address from an EMAIL_ADDRESS detector scan with an exclusion dictionary.
Inspect a string from sensitive data by using a custom hotword
Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby. Illustrates using the InspectConfig property for the purpose of scanning a medical database for patient names. You can use Cloud DLP's built-in PERSON_NAME infoType detector, but that causes Cloud DLP to match on all names of people, not just names of patients. To fix this, you can include a hotword rule that looks for the word "patient" within a certain character proximity from the first character of potential matches. You can then assign findings that match this pattern a likelihood of "very likely," since they correspond to your special criteria. Setting the minimum likelihood to VERY_LIKELY within InspectConfig ensures that only matches to this configuration are returned in findings.
Inspect a string with an exclusion dictionary substring
Omit scan matches that include the substring "TEST".
Inspect a string, excluding REGEX matches
Omit email addresses ending with a specific domain from an EMAIL_ADDRESS detector scan.
Inspect BigQuery for sensitive data with sampling
The following examples demonstrate using the Cloud Data Loss Prevention API to scan a 1000-row subset of a BigQuery table. The scan starts from a random row.
Inspect data with a custom regex
Regex example: Matching medical record numbers. The following sample uses a regular expression custom infoType detector that instructs Cloud DLP to match a medical record number (MRN) in the input text "Patient's MRN 444-5-22222," and then assigns each match a likelihood of POSSIBLE.
Inspect image for sensitive data with infoTypes
To inspect an image for sensitive data, you submit a base64-encoded image to the Cloud DLP API's content.inspect method. Unless you specify information types (infoTypes) to search for, Cloud DLP searches for the most common infoTypes.
Inspect storage with sampling
The following examples demonstrate using the Cloud DLP API to scan a 90% subset of a Cloud Storage bucket for person names. The scan starts from a random location in the dataset and only includes text files under 200 bytes.
Re-identify content encrypted by deterministic encryption
Re-identify content that was previously de-identified through deterministic encryption.
Re-identify free text with FPE using a surrogate
Uses the Cloud Data Loss Prevention API to re-identify sensitive data in a string that was encrypted by format-preserving encryption (FPE) with a surrogate type. The encryption is performed with an unwrapped key.