Classification and Redaction

The Data Loss Prevention API lets you understand and manage sensitive data. With the DLP API, you can easily classify and redact sensitive data. The DLP API can classify textual and image-based information, redact sensitive data from text files, and classify any data you have stored in Google Cloud Storage or Google Cloud Datastore.

Text Classification

Given the following text input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

The output is a list of findings, organized into the following categories:

Example output is shown in the table below.

InfoType Likelihood Offset
US_HEALTHCARE_NPI VERY_LIKELY 122
EMAIL_ADDRESS LIKELY 72
US_DRIVERS_LICENSE_NUMBER LIKELY 155
CANADA_BC_PHN VERY_UNLIKELY 122
UK_TAXPAYER_REFERENCE VERY_UNLIKELY 122
CANADA_PASSPORT VERY_UNLIKELY 155

Automatic Text Redaction

Automatic redaction produces an output with sensitive data matches removed instead of giving you a list of findings.

Example automation redaction input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

Example output using a placeholder of "***":

Please update my records with the following information:
Email address: ***

National Provider Identifier: ***

Driver's license: ***

Image Classification

We use Optical Character Recognition (OCR) technology to decipher text prior to classification. Similar to our text classification, we return findings and include the addition of a bounding box where the text was found.

Storage Classification

Storage classification scans textual data stored in Google Cloud Storage or Google Cloud Datastore. Instead of streaming the textual data into the API, you specify in your API call the storage location for your Google Cloud Storage Bucket or Datastore Kind. The results of the scan are placed in temporary internal storage for 30 days and are linked to the API calling project. The results can be read by all authorized and authenticated API callers on the same project that executed the scan.

Google Cloud Datastore Example:

Project id: “sample-project”, Namespace: “library”, Kind name: “books”.\

Cloud Storage Example:

File pattern: gs://dataflow-samples/shakespeare/*.txt
File path: gs://dataflow-samples/shakespeare/kinglear.txt

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Data Loss Prevention API