Knowing where your sensitive data exists is often the first step in ensuring that it is properly secured and managed. This knowledge can help reduce the risk of exposing sensitive details such as credit card numbers, medical information, Social Security numbers, driver's license numbers, addresses, full names, and company-specific secrets. Periodic scanning of your data can also help with compliance requirements and ensure best practices are followed as your data grows and changes with use. To help meet compliance requirements, use Cloud Data Loss Prevention (Cloud DLP) to scan your BigQuery tables and to protect your sensitive data.
Cloud DLP is a fully-managed service allowing Google Cloud Platform customers to identify and protect sensitive data at scale. Cloud DLP uses more than 100 predefined detectors to identify patterns, formats, and checksums. Cloud DLP also provides a set of tools to de-identify your data including masking, tokenization, pseudonymization, date shifting, and more, all without replicating customer data.
To learn more about Cloud DLP, see the Cloud DLP documentation.
Before you begin
- Get familiar with Cloud DLP pricing and how to keep Cloud DLP costs under control.
- Enable the Cloud DLP API.
- Ensure that the user creating your Cloud DLP jobs is granted an appropriate predefined Cloud DLP IAM role or sufficient permissions to run Cloud DLP jobs.
Scanning BigQuery data using the GCP Console
To scan BigQuery data, you create a Cloud DLP job that analyzes a table. You can scan a BigQuery table quickly by using the Scan with DLP option in the BigQuery GCP Console.
To scan a BigQuery table using Cloud DLP:
Open the BigQuery web UI in the GCP Console.
Go to the BigQuery web UI
In the Resources secion, expand your project and dataset, and select the BigQuery table that you want to scan.
Click Export > Scan with DLP (Beta). The Cloud DLP job creation page opens in a new tab.
For Step 1: Choose input data, the values in the Name and Location sections are automatically generated. Also, the Sampling section is automatically configured to run a sample scan against your data. You can adjust the number of rows in the sample by choosing Percentage of rows for the Limit rows by field. You can also change the number of rows sampled by adjusting the value in the Maximum number of rows field.
(Optional) For Step 2: Configure detection, you can configure what types of data to look for, called
infoTypes. You can select from the list of pre-defined
infoTypes, or you can select a template if one exists. For more information on
infoTypes, see InfoTypes and infoType detectors in the Cloud DLP documentation.
(Optional) For Step 3: Add actions, enable Save to BigQuery to publish your Cloud DLP findings to a BigQuery table. If you don't store findings, the completed job will only contain statistics about the number of findings and their
infoTypes. Saving findings to BigQuery saves details about the precise location and confidence of each individual finding.
(Optional) If you enabled Save to BigQuery, in the Save to BigQuery section:
- For Project ID enter the project ID where your results are stored.
- For Dataset ID enter the name of the dataset that stores your results.
- (Optional) For Table ID enter the name of the table that stores your
results. If no table ID is specified, a default name is assigned to
a new table similar to the following:
dlp_googleapis_[DATE]_1234567890. If you specify an existing table, findings are appended to it.
(Optional) For Step 4: Schedule, configure a time span or schedule by selecting either Specify time span or Create a trigger to run the job on a periodic schedule.
(Optional) On the Review page, examine the details of your job.
After the Cloud DLP job completes, you are redirected to the job details page, and you're notified via email. You can view the results of the scan on the job details page, or you can click the link to the Cloud DLP job details page in the job completion email.
If you chose to publish Cloud DLP findings to BigQuery, on the Job details page, click View Findings in BigQuery to open the table in the BigQuery web UI. You can then query the table and analyze your findings. For more information on querying your results in BigQuery, see Querying Cloud DLP findings in BigQuery in the Cloud DLP documentation.
To learn more about inspecting BigQuery and other storage repositories for sensitive data using Cloud DLP, see the following topics in the Cloud DLP documentation:
- Inspecting storage and databases for sensitive data
- Creating Cloud DLP inspection jobs and job triggers
If you want to redact or otherwise de-identify the sensitive data that the Cloud DLP scan found, see:
- De-identifying sensitive data in the Cloud DLP documentation
- AEAD encryption concepts in standard SQL for information on encrypting individual values within a table
- Protecting data with Cloud KMS keys for information on creating and managing your own encryption keys in Cloud KMS to encrypt BigQuery tables
- Identity and Security blog post: Taking charge of your data: using Cloud DLP to de-identify and obfuscate sensitive information