This page contains code samples for Cloud Data Loss Prevention. To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.
View in documentation
Compute l-diversity
Compute l-diversity with Cloud DLP. L-diversity, which is an extension of k-anonymity, measures the diversity of sensitive values for each column in which they occur. A dataset has l-diversity if, for every set of rows with identical quasi-identifiers, there are at least l distinct values for each sensitive attribute.
View in documentation
Compute numerical statistics
You can determine minimum, maximum, and quantile values for an individual BigQuery column. To calculate these values, you configure a DlpJob, setting the NumericalStatsConfig privacy metric to the name of the column to scan. When you run the job, Cloud DLP computes statistics for the given column, returning its results in the NumericalStatsResult object.
View in documentation
Computing k-anonymity
K-anonymity is a property of a dataset that indicates the re-identifiability of its records. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset. This sample demonstrates how to use Cloud DLP to compute a k-anonymity value.
View in documentation
Create an exception list for de-identification
Create an exception list for a regular custom dictionary detector.
View in documentation
View in documentation
Create an inspection template
Use templates to create and persist configuration information for use with Cloud DLP. Templates are useful for decoupling configuration information—such as what you inspect for and how you de-identify it—from the implementation of your requests. Templates provide a way to re-use configuration and enable consistency across users and datasets. In addition, whenever you update a template, it's updated for any job trigger that uses it.
View in documentation
View in documentation
De-identify content through deterministic encryption
Use the Data Loss Prevention API to de-identify sensitive data in a string using deterministic encryption, which is a reversible cryptographic method. The encryption is performed with a wrapped key.
De-identify data using table bucketing
Transform a column without inspection. To transform a column in which the content is already known, you can skip inspection and specify a transformation directly.
View in documentation
De-identify data: Redacting with matched input values
Uses the Data Loss Prevention API to de-identify sensitive data in a string by redacting matched input values.
View in documentation
De-identify free text with FPE by using a surrogate
Uses the Data Loss Prevention API to de-identify sensitive data in a string using format-preserving encryption (FPE). The encryption is performed with an unwrapped key.
De-identify sensitive data by replacing with infoType
Uses the Data Loss Prevention API to de-identify sensitive data in a string by replacing it with the infoType.
View in documentation
De-identify sensitive data with a simple word list
Matches against a custom simple word list to de-identify sensitive data.
View in documentation
De-identify sensitive data: Replacing matched input values
Uses the Data Loss Prevention API to de-identify sensitive data in a string by replacing matched input values with a value that you specify.
View in documentation
De-identify table data using conditional logic and replace with infoTypes
Transform findings only when specific conditions are met on another field.
View in documentation
De-identify table data using masking and conditional logic
Transform a column based on the value of another column.
View in documentation
De-identify table data with format-preserving encryption
Demonstrates encrypting sensitive data in a table while maintaining format.
De-identify table data with infoTypes
Transform findings found in columns. You can transform findings that either make up part of a cell's content or all of it. In this example, all instances of PERSON_NAME are anonymized.
View in documentation
De-identify table data: Suppress a row based on the content of a column
Suppress a row based on the content of a column. You can remove a row entirely based on the content that appears in any column. This example suppresses the record for "Charles Dickens," as this patient is over 89 years old.
View in documentation
Delete an inspection template
Delete an inspection template from Cloud DLP.
View in documentation
Format-preserving encryption (FPE)
Demonstrates encrypting sensitive characters while maintaining format.
View in documentation
View in documentation
Inspect a local file
Demonstrates finding sensitive data in a local text or image file.
View in documentation
Inspect a string for sensitive data by using multiple rules
Illustrates applying both exclusion and hotword rules. This snippet's rule set includes both hotword rules and dictionary and regex exclusion rules. Notice that the four rules are specified in an array within the rules element.
View in documentation
Inspect a string for sensitive data, excluding a custom substring
Illustrates how to use an InspectConfig to instruct Cloud DLP to avoid matching on the name "Jimmy" in a scan that uses the specified custom regular expression detector.
View in documentation
Inspect a string for sensitive data, omitting custom matches
Omit scan matches from a PERSON_NAME detector scan that overlap with a custom detector.
View in documentation
Inspect a string for sensitive data, omitting overlapping matches on domain and email
Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan.
View in documentation
Inspect a string for sensitive data, omitting overlapping matches on person and email
Omit matches on a PERSON_NAME detector if also matched by an EMAIL_ADDRESS detector.
View in documentation
Inspect a string for sensitive data, using exclusion dictionary
Omit a specific email address from an EMAIL_ADDRESS detector scan with an exclusion dictionary.
View in documentation
Inspect a string from sensitive data by using a custom hotword
Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby. Illustrates using the InspectConfig property for the purpose of scanning a medical database for patient names. You can use Cloud DLP's built-in PERSON_NAME infoType detector, but that causes Cloud DLP to match on all names of people, not just names of patients. To fix this, you can include a hotword rule that looks for the word "patient" within a certain character proximity from the first character of potential matches. You can then assign findings that match this pattern a likelihood of "very likely," since they correspond to your special criteria. Setting the minimum likelihood to VERY_LIKELY within InspectConfig ensures that only matches to this configuration are returned in findings.
View in documentation
Inspect a string with an exclusion dictionary substring
Omit scan matches that include the substring "TEST".
View in documentation
Inspect a string, excluding REGEX matches
Omit email addresses ending with a specific domain from an EMAIL_ADDRESS detector scan.
View in documentation
View in documentation
Inspect an image file for sensitive data
Uses Cloud DLP to inspect an image for sensitive data.
View in documentation
Inspect an image for sensitive data with listed infoTypes
If you want to inspect an image for only certain sensitive data types, specify their corresponding built-in infoTypes.
View in documentation
Inspect BigQuery for sensitive data with sampling
The following examples demonstrate using the Cloud Data Loss Prevention API to scan a 1000-row subset of a BigQuery table. The scan starts from a random row.
View in documentation
Inspect data for phone numbers
Demonstrates a simple scan request to the Cloud DLP API. Notice that the PHONE_NUMBER detector is specified in inspectConfig, which instructs Cloud DLP to scan the given string for a phone number.
View in documentation
Inspect data with a custom regex
Regex example: Matching medical record numbers. The following sample uses a regular expression custom infoType detector that instructs Cloud DLP to match a medical record number (MRN) in the input text "Patient's MRN 444-5-22222," and then assigns each match a likelihood of POSSIBLE.
View in documentation
Inspect data with a hotword rule
This sample uses a custom regex with a hotword rule to increase the likelihood of match.
View in documentation
Inspect image for sensitive data with infoTypes
To inspect an image for sensitive data, you submit a base64-encoded image to the Cloud DLP API's content.inspect method. Unless you specify information types (infoTypes) to search for, Cloud DLP searches for the most common infoTypes.
View in documentation
Inspect storage with sampling
The following examples demonstrate using the Cloud DLP API to scan a 90% subset of a Cloud Storage bucket for person names. The scan starts from a random location in the dataset and only includes text files under 200 bytes.
View in documentation
View in documentation
Re-identify content encrypted by deterministic encryption
Re-identify content that was previously de-identified through deterministic encryption.
Re-identify free text with FPE using a surrogate
Uses the Cloud Data Loss Prevention API to re-identify sensitive data in a string that was encrypted by format-preserving encryption (FPE) with a surrogate type. The encryption is performed with an unwrapped key.
View in documentation
Redact data from an image with color-coded infoTypes
Redacting infoTypes from an image with color coding.
View in documentation
Redact only certain sensitive data from an image using infoTypes
Redact only certain sensitive data from an image.
View in documentation
Redact sensitive data from an image using default infoTypes
Redact the default infoTypes from this image.
View in documentation