Take charge of your data: Scan for sensitive data in just a few clicks
Dominique Elkind Houts
Interaction Designer, Google Cloud
Jordanna Chord
Engineer, Google Cloud
Preventing the exposure of sensitive data is of critical importance for many businesses—particularly those in industries like finance and healthcare. Cloud Data Loss Prevention (DLP) lets you protect sensitive data by building in an additional layer of data security and privacy into your data workloads. It also provides native services for large-scale inspection, discovery, and classification of data in storage repositories like Cloud Storage and BigQuery.
Originally released as an API, Cloud DLP now includes a user interface (UI), which helps extend these capabilities to security, privacy, and compliance teams. Using the Cloud DLP UI, now generally available in the Google Cloud Console, you can discover, inspect, and classify sensitive data in just a few clicks by creating jobs, job triggers, and configuration templates.
In addition, Cloud DLP now features simplified storage inspection pricing based on bytes scanned, making costs more predictable.
Interacting with Cloud DLP through the UI provides many of the same features and benefits of the API. For example, you can:
Inspect Cloud Storage, BigQuery, and Cloud Datastore repositories for sensitive data using one-off jobs, or create a job trigger to automate and monitor resources on a schedule you define.
Detect and classify common infoTypes (sensitive data type detectors such as email addresses or credit card numbers) or custom infoTypes you define to protect internal identifiers or company secrets.
Create data inspection templates to re-use configuration settings across multiple scan jobs or job triggers.
Publish Cloud DLP scan findings to BigQuery, Data Catalog, and Cloud Security Command Center for further analysis and reporting.
Include Cloud DLP as part of Google Cloud’s fully automated and scalable service suite to help meet regulatory compliance requirements.
Let’s take a deeper look at the Google Cloud Platform Console user interface and show how you can start to inspect your enterprise data with just a few clicks.
Getting started with the Cloud DLP UI
The Cloud DLP UI lets you perform the most common data protection tasks: scanning Cloud Storage buckets, BigQuery and Cloud Datastore; configuring timespans, and setting up monitoring with periodic scans. Let’s take a closer look.
Scanning Cloud Storage Buckets
Cloud Storage is a highly scalable object storage for developers and enterprises, which use it as an integral part of their applications and data workloads. These workloads can include sensitive data such as credit card numbers, medical information, Social Security numbers, driver's license numbers, addresses, full names, and service account credentials—all of which need strong protection. This is where using Cloud DLP with Cloud Storage can help.
Using Cloud DLP with your Cloud Storage repositories lets you can identify where sensitive data is stored, and then use tools to redact those sensitive identifiers. Cloud DLP uses more than 100 predefined detectors to help you better discover, classify, and govern your data. With the DLP UI in Cloud Console, you can now discover and inspect your data in a few steps.
1. Define what you want to scan, such as a Cloud Storage bucket, folder, or individual file.
2. Then, filter that data by adding include or exclude patterns to narrow down the files you want to inspect
3. Scale your scans, by turning on sampling to increase efficiency and reduce cost:
Sample storage objects
Sample bytes per object
You can also take advantage of our integration with the Cloud Storage UI, where you can select a bucket and simply click “Scan with DLP.” (More details on that here.)
Scanning BigQuery
BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse. It can help you analyze your company's most critical data assets and natively delivers powerful features like business intelligence (BI)-engine and machine learning. Similar to Cloud Storage, this data may contain sensitive or regulated information such as personally identifiable information (PII). Using Cloud DLP with BigQuery can help you discover and classify this information. Here’s how.
2. Decide whether to perform an exhaustive or sampled scan... For BigQuery, you can sample a random percentage or a fixed number of rows:
3. Since BigQuery data is “structured” tabular data, your findings will include additional metadata such as column names. You can optionally specify an identifying field such as a row or record number so that you can pinpoint findings and map them back to your source tables.
You can also take advantage of our integration into the BigQuery UI, where you can select a table and click “Scan with DLP.” (More details on that here.)
Scanning Cloud Datastore
Cloud Datastore is a highly scalable NoSQL database for web and mobile applications. Cloud DLP enables you to inspect data stored in Datastore by simply specifying the namespace and kind.
Configuring timespans
When managing data at scale, it’s critical that you can scan only what you need to scan. TimespanConfig enables you to narrow a scan based on the create or modify date/timestamp. For example, you might only want to scan data that was created within a two-week window or only scan data that has a create or modify date since the last scan.
Set up monitoring with periodic scans
Using DLP job triggers, you can configure inspection scans to run on a periodic schedule.
These job runs can scan all content,sample from all content, or be limited to content created or modified since the last run. Triggered jobs are collected together as part of a trigger and enable you to see trends of data over time (job to job).
Tailor data detectors to your needs
Cloud DLP makes it easy to find sensitive data, with over 100 pre-defined infoTypes that you can turn on and use instantly from the UI. You can also take advantage of a rich set of customizations to tailor detection to your needs, help reduce false positives, and improve overall quality.
Selecting built-in InfoTypes
You can select from a long list of built-in infoTypes.
Building custom infoTypes
You can create custom infoTypes based on patterns, word lists, or dictionaries.
You can also create word lists inline or pull dictionary lists from Cloud Storage paths.
Create inspection rulesets
Rulesets can help tune results from both predefined and custom infoTypes. For example, maybe you want to find all EMAIL_ADDRESS infoTypes but exclude your own employees’ email addresses. With a simple exclusion list, you can do this.
Persist configuration with templates
Finally, you can also share inspection configuration across multiple jobs using templates.
View findings and take action
Whether you want to generate detailed findings to power an audit report, conduct an investigation, or use summary findings to trigger automated actions and alerts, it’s easy to do so from the Cloud DLP UI.
View job status and findings summaries directly in the UI.
Take action
When an inspection job is completed, Cloud DLP can automatically trigger actions.
Save to BigQuery: Write detailed findings (see more details below).
Publish to Cloud Pub/Sub: Emit a pub/sub notification when a job is completed. This can trigger custom logic in, for instance, a Cloud Function. See Automating the Classification of Data Uploaded to Cloud Storage for an example.
Publish to Cloud Security Command Center (currently only available for Cloud Storage scans)
Publish to Data Catalog (currently only available for BigQuery scans)
Notify by email: Send an email with job completion details.
Detailed Findings
Detailed findings can be turned on and written directly to BigQuery, enabling:
Cloud Storage object-level findings - Includes the object path for every finding
BigQuery column level findings - Includes the table field name with every finding
Run analytics in SQL
Generate custom dashboards or audit reports in tools like Data Studio (watch a demo of this from a recent Cloud OnAir webinar here).
Export findings to your SIEM
Include Quote
You can optionally turn on “include quote” when writing detailed findings. This writes a copy of the finding alongside the metadata in the BigQuery output. The quote can help you:
Do analysis of findings to help inform tuning rules and reduce unwanted results (e.g., inform exclusion rules).
Help build a customer or user inventory so you know where data exists per subject, which can help inform your privacy and compliance processes for efforts like data access and deletion requests.
Now, everyone can protect sensitive data
Cloud DLP is a powerful, flexible service that brings state-of-the-art data protection capabilities to a variety of workloads and storage formats. And now, with an easy-to-use UI, those capabilities are available to broader security, compliance and legal teams. To learn more, visit our Cloud Data Loss Prevention page for more resources on getting started.
Special thanks to Scott Ellis, Product Manager, Noël Bankston, User Experience Researcher, and Jesse Flemming, Engineer, who contributed to this post.