All Cloud DLP code samples

This page contains code samples for Cloud Data Loss Prevention. To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.

Character masking

Demonstrates masking characters.

Client library quickstart

Demonstrates inspecting a string with the Cloud DLP API.

View in documentation

Compute l-diversity

Compute l-diversity with Cloud DLP. L-diversity, which is an extension of k-anonymity, measures the diversity of sensitive values for each column in which they occur. A dataset has l-diversity if, for every set of rows with identical quasi-identifiers, there are at least l distinct values for each sensitive attribute.

View in documentation

Compute numerical statistics

You can determine minimum, maximum, and quantile values for an individual BigQuery column. To calculate these values, you configure a DlpJob, setting the NumericalStatsConfig privacy metric to the name of the column to scan. When you run the job, Cloud DLP computes statistics for the given column, returning its results in the NumericalStatsResult object.

Computing k-anonymity

K-anonymity is a property of a dataset that indicates the re-identifiability of its records. A dataset is k-anonymous if quasi-identifiers for each person in the dataset are identical to at least k – 1 other people also in the dataset. This sample demonstrates how to use Cloud DLP to compute a k-anonymity value.

View in documentation

Computing k-map estimates

You can estimate k-map values using Cloud DLP, which uses a statistical model to estimate a re-identification dataset.

View in documentation

Create a job trigger

Creates a scheduled Cloud Data Loss Prevention API job trigger.

Create an exception list for de-identification

Create an exception list for a regular custom dictionary detector.

Create an inspection job

Creates an inspection job with the Cloud Data Loss Prevention API.

Create an inspection template

Use templates to create and persist configuration information for use with Cloud DLP. Templates are useful for decoupling configuration information—such as what you inspect for and how you de-identify it—from the implementation of your requests. Templates provide a way to re-use configuration and enable consistency across users and datasets. In addition, whenever you update a template, it's updated for any job trigger that uses it.

Date shifting of a CSV file

Demonstrates date shifting of a CSV file.

De-identify content through deterministic encryption

Use the Data Loss Prevention API to de-identify sensitive data in a string using deterministic encryption, which is a reversible cryptographic method. The encryption is performed with a wrapped key.

De-identify data using table bucketing

Transform a column without inspection. To transform a column in which the content is already known, you can skip inspection and specify a transformation directly.

De-identify data: Redacting with matched input values

Uses the Data Loss Prevention API to de-identify sensitive data in a string by redacting matched input values.

De-identify free text with FPE by using a surrogate

Uses the Data Loss Prevention API to de-identify sensitive data in a string using format-preserving encryption (FPE). The encryption is performed with an unwrapped key.

De-identify sensitive data by replacing with infoType

Uses the Data Loss Prevention API to de-identify sensitive data in a string by replacing it with the infoType.

View in documentation

De-identify sensitive data with a simple word list

Matches against a custom simple word list to de-identify sensitive data.

De-identify sensitive data: Replacing matched input values

Uses the Data Loss Prevention API to de-identify sensitive data in a string by replacing matched input values with a value that you specify.

De-identify table data using conditional logic and replace with infoTypes

Transform findings only when specific conditions are met on another field.

De-identify table data using masking and conditional logic

Transform a column based on the value of another column.

De-identify table data with format-preserving encryption

Demonstrates encrypting sensitive data in a table while maintaining format.

De-identify table data with infoTypes

Transform findings found in columns. You can transform findings that either make up part of a cell's content or all of it. In this example, all instances of PERSON_NAME are anonymized.

De-identify table data: Suppress a row based on the content of a column

Suppress a row based on the content of a column. You can remove a row entirely based on the content that appears in any column. This example suppresses the record for "Charles Dickens," as this patient is over 89 years old.

Delete an inspection template

Delete an inspection template from Cloud DLP.

Format-preserving encryption (FPE)

Demonstrates encrypting sensitive characters while maintaining format.

Get an inspection job

Get DLP inspection job.

Inspect a Cloud Storage file

Demonstrates finding sensitive data in a file that is located in Cloud Storage.

Inspect a local file

Demonstrates finding sensitive data in a local text or image file.

Inspect a string

Demonstrates finding sensitive data in a string.

View in documentation

Inspect a string for sensitive data by using multiple rules

Illustrates applying both exclusion and hotword rules. This snippet's rule set includes both hotword rules and dictionary and regex exclusion rules. Notice that the four rules are specified in an array within the rules element.

Inspect a string for sensitive data, excluding a custom substring

Illustrates how to use an InspectConfig to instruct Cloud DLP to avoid matching on the name "Jimmy" in a scan that uses the specified custom regular expression detector.

Inspect a string for sensitive data, omitting custom matches

Omit scan matches from a PERSON_NAME detector scan that overlap with a custom detector.

Inspect a string for sensitive data, omitting overlapping matches on domain and email

Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan.

Inspect a string for sensitive data, omitting overlapping matches on person and email

Omit matches on a PERSON_NAME detector if also matched by an EMAIL_ADDRESS detector.

Inspect a string for sensitive data, using exclusion dictionary

Omit a specific email address from an EMAIL_ADDRESS detector scan with an exclusion dictionary.

Inspect a string from sensitive data by using a custom hotword

Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby. Illustrates using the InspectConfig property for the purpose of scanning a medical database for patient names. You can use Cloud DLP's built-in PERSON_NAME infoType detector, but that causes Cloud DLP to match on all names of people, not just names of patients. To fix this, you can include a hotword rule that looks for the word "patient" within a certain character proximity from the first character of potential matches. You can then assign findings that match this pattern a likelihood of "very likely," since they correspond to your special criteria. Setting the minimum likelihood to VERY_LIKELY within InspectConfig ensures that only matches to this configuration are returned in findings.

Inspect a string with an exclusion dictionary substring

Omit scan matches that include the substring "TEST".

Inspect a string, excluding REGEX matches

Omit email addresses ending with a specific domain from an EMAIL_ADDRESS detector scan.

Inspect a table for sensitive content

Check a table of data for sensitive content.

Inspect an image file for sensitive data

Uses Cloud DLP to inspect an image for sensitive data.

View in documentation

Inspect an image for sensitive data with listed infoTypes

If you want to inspect an image for only certain sensitive data types, specify their corresponding built-in infoTypes.

View in documentation

Inspect BigQuery for sensitive data

Demonstrates finding sensitive data that is stored in BigQuery.

Inspect BigQuery for sensitive data with sampling

The following examples demonstrate using the Cloud Data Loss Prevention API to scan a 1000-row subset of a BigQuery table. The scan starts from a random row.

Inspect data for phone numbers

Demonstrates a simple scan request to the Cloud DLP API. Notice that the PHONE_NUMBER detector is specified in inspectConfig, which instructs Cloud DLP to scan the given string for a phone number.

View in documentation

Inspect data with a custom regex

Regex example: Matching medical record numbers. The following sample uses a regular expression custom infoType detector that instructs Cloud DLP to match a medical record number (MRN) in the input text "Patient's MRN 444-5-22222," and then assigns each match a likelihood of POSSIBLE.

Inspect data with a hotword rule

This sample uses a custom regex with a hotword rule to increase the likelihood of match.

View in documentation

Inspect Datastore

Demonstrates finding sensitive data stored in Datastore.

Inspect image for sensitive data with infoTypes

To inspect an image for sensitive data, you submit a base64-encoded image to the Cloud DLP API's content.inspect method. Unless you specify information types (infoTypes) to search for, Cloud DLP searches for the most common infoTypes.

View in documentation

Inspect storage with sampling

The following examples demonstrate using the Cloud DLP API to scan a 90% subset of a Cloud Storage bucket for person names. The scan starts from a random location in the dataset and only includes text files under 200 bytes.

List information types for a category

Demonstrates listing information types for a category.

View in documentation

List jobs

List all Cloud DLP jobs for the current project.

List triggers

List all job triggers for the current project.

Perform risk analysis

Use the Data Loss Prevention API to compute risk metrics of a column of categorical data in a BigQuery table.

Re-identify content encrypted by deterministic encryption

Re-identify content that was previously de-identified through deterministic encryption.

Re-identify content encrypted by FPE

Demonstrates re-identifying de-identified content.

Re-identify free text with FPE using a surrogate

Uses the Cloud Data Loss Prevention API to re-identify sensitive data in a string that was encrypted by format-preserving encryption (FPE) with a surrogate type. The encryption is performed with an unwrapped key.

Re-identify table data with FPE

Re-identify table data with format-preserving encryption.

Re-identify text data with FPE

Re-identify text data with format-preserving encryption.

Redact all detected text in an image

Redact all detected text in an image.

View in documentation

Redact an image

Demonstrates redacting sensitive data from an image.

View in documentation

Redact data from an image with color-coded infoTypes

Redacting infoTypes from an image with color coding.

View in documentation

Redact only certain sensitive data from an image using infoTypes

Redact only certain sensitive data from an image.

View in documentation

Redact sensitive data from an image using default infoTypes

Redact the default infoTypes from this image.

View in documentation