Sensitive Data Protection overview

Sensitive Data Protection helps you discover, classify, and de-identify sensitive data inside and outside Google Cloud. This page describes the services that make up Sensitive Data Protection.

Sensitive data discovery

The discovery service lets you generate profiles for your data across an organization, folder, or project. Data profiles contain metrics and metadata about your tables and help you determine where sensitive and high-risk data reside. Sensitive Data Protection reports these metrics at various levels of detail. For information about the types of data you can profile, see Supported resources.

You use a scan configuration to specify the resource to scan, the types of information (infoTypes) to look for, the profiling frequency, and the actions to take when profiling is complete.

For more information about the discovery service, see Data profiles overview.

Sensitive data inspection

The inspection service lets you perform a deep scan of an individual resource to find instances of sensitive data. You specify the infoType that you want to search for, and the inspection service generates a report about every instance of data that matches that infoType. For example, the report tells you how many credit card numbers are in a Cloud Storage bucket and the exact location of each instance.

There are two ways to perform an inspection:

  • Create an inspection or hybrid job through the Google Cloud console or through the Cloud Data Loss Prevention API of Sensitive Data Protection (DLP API).
  • Send a content.inspect request to the DLP API.

Inspection through a job

You can configure inspection and hybrid jobs through the Google Cloud console or through the Cloud Data Loss Prevention API. The results of inspection and hybrid jobs are stored in Google Cloud.

You can specify actions that you want Sensitive Data Protection to take when the inspection or hybrid job is complete. For example, you can configure a job to save the findings to a BigQuery table or send a Pub/Sub notification.

Inspection jobs

Sensitive Data Protection has built-in support for select Google Cloud products. You can inspect a BigQuery table, a Cloud Storage bucket or folder, and a Datastore kind. For more information, see Inspect Google Cloud storage and databases for sensitive data.

Hybrid jobs

A hybrid job lets you scan payloads of data sent from any source, and then store the inspection findings in Google Cloud. For more information, see Hybrid jobs and job triggers.

Inspection through a content.inspect request

The content.inspect method of the DLP API lets you send data directly to the DLP API for inspection. The response contains the inspection findings. Use this approach if you require a synchronous operation or if you don't want to store the findings in Google Cloud.

Sensitive data de-identification

The de-identification service lets you obfuscate instances of sensitive data. Various transformation methods are available, including masking, redaction, bucketing, date shifting, and tokenization.

There are two ways to perform de-identification:

Risk analysis

The risk analysis service lets you analyze structured BigQuery data to identify and visualize the risk that sensitive information will be revealed (re-identified).

You can use risk analysis methods before de-identification to help determine an effective de-identification strategy, or after de-identification to monitor for any changes or outliers.

You perform risk analysis by creating a risk analysis job. For more information, see Re-identification risk analysis.

Cloud Data Loss Prevention API

The Cloud Data Loss Prevention API lets you use the Sensitive Data Protection services programmatically. Through the DLP API, you can inspect data from inside and outside Google Cloud and build custom workloads on or off cloud. For more information, see Service method types.

Asynchronous operations

If you want to asynchronously inspect or analyze data at rest, you can use the DLP API to create a DlpJob. Creating a DlpJob is the equivalent of creating an inspection job, hybrid job, or risk analysis job through the Google Cloud console. The results of a DlpJob are stored in Google Cloud.

Synchronous operations

If you want to inspect, de-identify, or re-identify data synchronously, use the inline content methods of the DLP API. To de-identify data in images, you can use the image.redact method. You send the data in an API request and the DLP API responds with the inspection, de-identification, or re-identification results. The results of content methods and the image.redact method aren't stored in Google Cloud.

What's next