Sensitive Data Protection documentation
Sensitive Data Protection provides access to a powerful sensitive data inspection, classification, and de-identification platform.
Sensitive Data Protection includes:
- Over 200 built-in information type (or "infoType") detectors.
- The ability to define custom infoType detectors using dictionaries, regular expressions, and contextual elements.
- De-identification techniques including redaction, masking, format-preserving encryption, date-shifting, and more.
- The ability to detect sensitive data within streams of data, structured text, files in storage repositories such as Cloud Storage and BigQuery, and even within images.
- Analysis of structured data to help understand its risk of being re-identified, including computation of metrics like k-anonymity, l-diversity, and more.
- The ability to automatically discover unencrypted secrets and profile data across an organization, folder, or project to identify data assets where high-risk and sensitive data reside.
Start your next project with $300 in free credit
Build and test a proof of concept with the free trial credits and free monthly usage of 20+ products.
Documentation resources
Guides
-
Quickstart: Using a JSON request
-
Quickstart: Inspect sensitive text by using the command line
-
Quickstart: Schedule a Sensitive Data Protection inspection scan
-
De-identifying sensitive data
-
Redacting sensitive data from images
-
Inspecting storage and databases for sensitive data
-
Inspecting text for sensitive data
-
Creating inspection templates
-
Creating and scheduling inspection jobs
-
Related resources
Related videos
Inspect data with a custom regex
Regex example: Matching medical record numbers. The following sample uses a regular expression custom infoType detector that instructs Cloud DLP to match a medical record number (MRN) in the input text "Patient's MRN 444-5-22222," and then assigns each match a likelihood of POSSIBLE.
De-identify table data with infoTypes
Transform findings found in columns. You can transform findings that either make up part of a cell's content or all of it. In this example, all instances of PERSON_NAME are anonymized.
Inspect a string for sensitive data, omitting overlapping matches on domain and email
Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan.
De-identify tabular data through bucketing
This sample replaces the values within each bucket with predefined replacement values.
Inspect a string for sensitive data by using multiple rules
Illustrates applying both exclusion and hotword rules. This snippet's rule set includes both hotword rules and dictionary and regex exclusion rules. Notice that the four rules are specified in an array within the rules element.
List information types for a category
Demonstrates listing information types for a category.
Client library quickstart
Demonstrates inspecting a string with the Cloud DLP API.
Inspect a local file
Demonstrates finding sensitive data in a local text or image file.
Inspect a string from sensitive data by using a custom hotword
Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby. Illustrates using the InspectConfig property for the purpose of scanning a medical database for patient names. You can use Cloud DLP's built-in PERSON_NAME infoType detector, but that causes Cloud DLP to match on all names of people, not just names of patients. To fix this, you can include a hotword rule that looks for the word "patient" within a certain character proximity from the first character of potential matches. You can then assign findings that match this pattern a likelihood of "very likely," since they correspond to your special criteria. Setting the minimum likelihood to VERY_LIKELY within InspectConfig ensures that only matches to this configuration are returned in findings.
Get an inspection job
Get DLP inspection job.
Inspect a string for sensitive data, omitting overlapping matches on person and email
Omit matches on a PERSON_NAME detector if also matched by an EMAIL_ADDRESS detector.
Inspect BigQuery for sensitive data with sampling
The following examples demonstrate using the Cloud Data Loss Prevention API to scan a 1000-row subset of a BigQuery table. The scan starts from a random row.
Compute l-diversity
Compute l-diversity with Cloud DLP. L-diversity, which is an extension of k-anonymity, measures the diversity of sensitive values for each column in which they occur. A dataset has l-diversity if, for every set of rows with identical quasi-identifiers, there are at least l distinct values for each sensitive attribute.
Scan content using a large custom dictionary detector
This sample scans the given text using the specified stored infoType detector.
Computing k-map estimates
You can estimate k-map values using Cloud DLP, which uses a statistical model to estimate a re-identification dataset.
List triggers
List all job triggers for the current project.
Inspect BigQuery for sensitive data
Demonstrates finding sensitive data that is stored in BigQuery.
Redact only certain sensitive data from an image using infoTypes
Redact only certain sensitive data from an image.
Inspect a Cloud Storage file
Demonstrates finding sensitive data in a file that is located in Cloud Storage.
Inspect data for phone numbers
Demonstrates a simple scan request to the Cloud DLP API. Notice that the PHONE_NUMBER detector is specified in inspectConfig, which instructs Cloud DLP to scan the given string for a phone number.
Inspect a string
Demonstrates finding sensitive data in a string.
De-identify data: Redacting with matched input values
Uses the Data Loss Prevention API to de-identify sensitive data in a string by redacting matched input values.
Delete an inspection template
Delete an inspection template from Cloud DLP.
De-identify table data with format-preserving encryption
Demonstrates encrypting sensitive data in a table while maintaining format.
Redact all detected text in an image
Redact all detected text in an image.
Compute numerical statistics
You can determine minimum, maximum, and quantile values for an individual BigQuery column. To calculate these values, you configure a DlpJob, setting the NumericalStatsConfig privacy metric to the name of the column to scan. When you run the job, Cloud DLP computes statistics for the given column, returning its results in the NumericalStatsResult object.
Re-identify table data with FPE
Re-identify table data with format-preserving encryption.
Inspect a string for sensitive data, using exclusion dictionary
Omit a specific email address from an EMAIL_ADDRESS detector scan with an exclusion dictionary.
Create an inspection job
Creates an inspection job with the Cloud Data Loss Prevention API.
Re-identify content encrypted by deterministic encryption
Re-identify content that was previously de-identified through deterministic encryption.