Classification, redaction, and de-identification

The Cloud Data Loss Prevention (DLP) helps you understand, manage, and protect sensitive data. With the Cloud DLP, you can easily classify and redact sensitive data contained in text-based content and images, including content stored in Google Cloud Platform storage repositories.

Text classification

Given the following text input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

The output is a list of findings, organized into the following categories:

Example output is shown in the table below.

InfoType Likelihood Offset
US_HEALTHCARE_NPI VERY_LIKELY 122
EMAIL_ADDRESS LIKELY 72
US_DRIVERS_LICENSE_NUMBER LIKELY 155
CANADA_BC_PHN VERY_UNLIKELY 122
UK_TAXPAYER_REFERENCE VERY_UNLIKELY 122
CANADA_PASSPORT VERY_UNLIKELY 155

Automatic text redaction

Automatic redaction produces an output with sensitive data matches removed instead of giving you a list of findings.

Example automation redaction input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

Example output using a placeholder of "***":

Please update my records with the following information:
Email address: ***

National Provider Identifier: ***

Driver's license: ***

Image classification

Cloud DLP uses Optical Character Recognition (OCR) technology to recognize text prior to classification. Similar to text classification, it returns findings, but it also adds a bounding box where the text was found.

Storage classification

Storage classification scans data stored in Cloud Storage, Cloud Datastore, and BigQuery. Instead of streaming data into Cloud DLP, you specify in your request the storage location for the Cloud Storage bucket, Cloud Datastore kind, or BigQuery table you want Cloud DLP to scan.

A list of file extensions for the file types that Cloud DLP can scan is available on the reference page for FileType. Files of types that are unrecognized are scanned as binary files.

The results of the scan can be either saved to a new BigQuery table or published to a Cloud Pub/Sub topic. From there, you can use built-in BigQuery tools to run rich SQL analytics or tools such as Google Data Studio to generate reports.

For more information about scanning storage repositories for sensitive data using Cloud DLP, see Inspecting storage and databases for sensitive data.

For more information about visualizing scan results using other Google Cloud Platform tools, see Analyzing and reporting on Cloud DLP findings.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Data Loss Prevention