Jump to Content
Storage & Data Transfer

Scan your Cloud Storage buckets for sensitive data using Cloud DLP

June 21, 2019
Adam Gavish

Product Manager, VPC Service Controls

Subhasish Chakraborty

Product Manager, Cloud Storage

A critical mission for businesses worldwide is to prevent the exposure of sensitive data—especially in highly regulated industries such as finance and healthcare, where meeting compliance requirements is a top priority. We talked recently about scanning BigQuery, our data warehouse, using Cloud Data Loss Prevention (DLP) to protect sensitive data through data discovery, classification, and redaction. But using these capabilities is essential for other Google Cloud Platform (GCP) services you use to store data, such as Cloud Storage. We are delighted to announce that scanning with Cloud DLP is now available in beta directly from the Cloud Storage UI. This lets you scan Cloud Storage buckets, folders, and objects for sensitive data with a few clicks directly from the Cloud Storage interface.

Using Cloud DLP for your Cloud Storage means you can identify where sensitive data is stored, then use tools to redact those sensitive identifiers. Cloud DLP uses more than 90 predefined detectors to identify patterns, formats, and checksums, and de-identification techniques like masking, secure hashing, and tokenization to redact sensitive data, all without replicating customer data.

Cloud DLP scan on Cloud Storage supports text, binary and image files. Some common Cloud Storage use cases include content storage and serving; storage for general computing, analytics, and AI/ML; and storing data for backup, archival, and disaster recovery purposes, among others. Such data stored with Cloud Storage can include sensitive data such as credit card numbers, medical information, social security numbers, driver's license numbers, addresses, full names and service account credentials—all of which need strong protection.

Here are some key benefits you’ll see when using Cloud DLP with Cloud Storage:

  • Detect common sensitive data types such as credit card numbers or custom sensitive data types to highlight intellectual property or company secrets.

  • Deploy fully automated and scalable service that helps meet compliance requirements.

  • Create triggers for automatic Cloud DLP scan scheduling.

  • Publish Cloud DLP scan findings to BigQuery and Cloud Security Command Center for further analysis and reporting.

  • De-identify and redact sensitive data.

Getting started with Cloud DLP for Cloud Storage 
It’s straightforward to start scanning your Cloud Storage buckets with Cloud DLP, and you can set up this new scan job to be run regularly. 

Browse to Cloud Storage in the GCP console, then click on the three-dot menu icon to the right of a relevant bucket. Click on the “Scan with Data Loss Prevention” menu item:

https://storage.googleapis.com/gweb-cloudblog-publish/images/Scan_with_Data_Loss_Prevention.max-1900x1900.png

Complete the Cloud DLP scan creation by clicking the “Create” button or, optionally, specify custom configurations such as what info types to inspect for, what sampling options to use, what actions to take, and more, as shown here:

https://storage.googleapis.com/gweb-cloudblog-publish/images/Create_job.max-1900x1900.png

Once Cloud DLP scans are completed, you’ll get emails with links to the “Scan details” page, where you can analyze findings and take further actions. From there, click on “View Findings in BigQuery” to analyze the results.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Scan_details.max-1900x1900.png

Use simple SQL queries to aggregate DLP findings and export them:

https://storage.googleapis.com/gweb-cloudblog-publish/images/simple_SQL_queries.max-1900x1900.png

For more details, check out the Cloud DLP documentation and see how GCP customers are using Cloud DLP in their organizations today.

Posted in