Jump to Content
Security & Identity

Automatic data risk management for BigQuery using DLP

April 14, 2022
Scott Ellis

Senior Product Manager

Protecting sensitive data and preventing unintended data exposure is critical for businesses. However, many organizations lack the tools to stay on top of where sensitive data resides across their enterprise. It’s particularly concerning when sensitive data shows up in unexpected places – for example, in logs that  services generate, when customers inadvertently send it in a customer support chat, or when managing unstructured analytical workloads. This is where Automatic Data Loss Prevention (DLP) for BigQuery can help.

Data discovery and classification is often implemented as a manual, on-demand process, and as a result  happens less frequently than many organizations would like. With a large amount of data being created on the fly, a more modern, proactive approach is to build discovery and classification into existing data analytics tools. By making it automatic, you can ensure that a key way to surface risk happens continuously - an example of Google Cloud's invisible security strategy. Automatic DLP is a fully-managed service that continuously scans data across your entire organization to give you general awareness of what data you have, and specific visibility into where sensitive data is stored and processed. This awareness is a critical first step in protecting and governing your data and acts as a key control to help improve your security, privacy, and compliance posture.

In October of last year, we announced the public preview for Automatic DLP for BigQuery. Since the announcement, our customers have already scanned and processed both structured and unstructured BigQuery data at multi-petabyte scale to identify where sensitive data resides and gain visibility into their data risk. That’s why we are happy to announce that Automatic DLP is now Generally Available. As part of the release we’ve also added several new features to make it even easier to understand your data and to make use of the insights in more Cloud workflows. These features include:

  • Premade Data Studio dashboards to give you more advanced summary, reporting, and investigation tools that you can customize to your business needs.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Easy_to_understand_dashboards_give_a_quick.max-1200x1200.jpg
Easy to understand dashboards give a quick overview of data in BQ

  • Finer grained controls to adjust frequency and conditions for when data is profiled or reprofiled, including the ability to enable certain subsets of your data to be scanned more frequently, less frequently, or skipped from profiling.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Granular_settings_for_how_often_data_is_scan.max-600x600.jpg
Granular settings for how often data is scanned

  • Automatic sync of DLP profiler insights and risk scores for each table into Chronicle, our Security Analytics platform. We aim to build synergy across our security portfolio, and with this integration we allow analysts using Chronicle to gain immediate insight into if the BQ data involved in a potential incident is of high value or not. This can significantly help to enhance threat detections, prioritizations, and security investigations. For example, if Chronicle detects several attacks, knowing if one is targeting highly sensitive data will help you prioritize, investigate, and remediate the most urgent threats first.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Deep_native_integration_into_Chronicle_hel.max-1600x1600.jpg
Deep native integration into Chronicle helps speed up detection and response

Managing data risk with data classification

Examples of sensitive data elements that typically need special attention are credit cards, medical information, Social Security numbers, government issued IDs, addresses, full names, and account credentials. Automatic DLP leverages machine learning and provides more than 150 predefined detectors to help discover, classify, and govern this sensitive data, allowing you to make sure the right protections are in place. 

Once you have visibility into your sensitive data, there are many options to help remediate issues or reduce your overall data risk. For example, you can use IAM to restrict access to datasets or tables or leverage BigQuery Policy Tags to set fine-grained access policies at the column level. Our Cloud DLP platform also provides a set of tools to run on-demand deep and exhaustive inspections of data or can help you obfuscate, mask, or tokenize data to reduce overall data risk. This capability is particularly important if you’re using data for analytics and machine learning, since that sensitive data must be handled appropriately to ensure your users’ privacy and compliance with privacy regulations.

How to get started

Automatic DLP can be turned on for your entire organization, selected organization folders, or individual projects. To learn more about these new capabilities or to get started today, open the Cloud DLP page in the Cloud Console and check out our documentation.

Posted in