Using Cloud DLP to scan BigQuery data

Knowing where your sensitive data exists is often the first step in ensuring that it is properly secured and managed. This knowledge can help reduce the risk of exposing sensitive details such as credit card numbers, medical information, Social Security numbers, driver's license numbers, addresses, full names, and company-specific secrets. Periodic scanning of your data can also help with compliance requirements and ensure best practices are followed as your data grows and changes with use. To help meet compliance requirements, use Cloud Data Loss Prevention (Cloud DLP) to scan your BigQuery tables and to protect your sensitive data.

Cloud DLP is a fully managed service that lets Google Cloud customers identify and protect sensitive data at scale. Cloud DLP uses more than 100 predefined detectors to identify patterns, formats, and checksums. Cloud DLP also provides a set of tools to de-identify your data including masking, tokenization, pseudonymization, date shifting, and more, all without replicating customer data.

To learn more about Cloud DLP, see the Cloud DLP documentation.

Before you begin

  1. Get familiar with Cloud DLP pricing and how to keep Cloud DLP costs under control.
  2. Enable the DLP API. Enable the API
  3. Ensure that the user creating your Cloud DLP jobs is granted an appropriate predefined Cloud DLP IAM role or sufficient permissions to run Cloud DLP jobs.

Scanning BigQuery data using the Cloud Console

To scan BigQuery data, you create a Cloud DLP job that analyzes a table. You can scan a BigQuery table quickly by using the Scan with DLP option in the BigQuery Cloud Console.

To scan a BigQuery table using Cloud DLP:

  1. In the Cloud Console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and dataset, then select the table.

  3. Click Export > Scan with DLP (beta). The Cloud DLP job creation page opens in a new tab.

  4. For Step 1: Choose input data, the values in the Name and Location sections are automatically generated. Also, the Sampling section is automatically configured to run a sample scan against your data. You can adjust the number of rows in the sample by choosing Percentage of rows for the Limit rows by field. You can also change the number of rows sampled by adjusting the value in the Maximum number of rows field.

  5. Click Continue.

  6. (Optional) For Step 2: Configure detection, you can configure what types of data to look for, called infoTypes. You can select from the list of predefined infoTypes, or you can select a template if one exists. For more information on infoTypes, see InfoTypes and infoType detectors in the Cloud DLP documentation.

  7. Click Continue.

  8. (Optional) For Step 3: Add actions, enable Save to BigQuery to publish your Cloud DLP findings to a BigQuery table. If you don't store findings, the completed job contains only statistics about the number of findings and their infoTypes. Saving findings to BigQuery saves details about the precise location and confidence of each individual finding.

  9. (Optional) If you enabled Save to BigQuery, in the Save to BigQuery section, enter the following information:

    • Project ID: the project ID where your results are stored.
    • Dataset ID: the name of the dataset that stores your results.
    • (Optional) Table ID: the name of the table that stores your results. If no table ID is specified, a default name is assigned to a new table similar to the following: dlp_googleapis_date_1234567890. If you specify an existing table, findings are appended to it.
  10. Click Continue.

  11. (Optional) For Step 4: Schedule, configure a time span or schedule by selecting either Specify time span or Create a trigger to run the job on a periodic schedule.

  12. Click Continue.

  13. (Optional) On the Review page, examine the details of your job.

  14. Click Create.

  15. After the Cloud DLP job completes, you are redirected to the job details page, and you're notified by email. You can view the results of the scan on the job details page, or you can click the link to the Cloud DLP job details page in the job completion email.

  16. If you chose to publish Cloud DLP findings to BigQuery, on the Job details page, click View Findings in BigQuery to open the table in the Cloud Console. You can then query the table and analyze your findings. For more information on querying your results in BigQuery, see Querying Cloud DLP findings in BigQuery in the Cloud DLP documentation.

Next steps

To learn more about inspecting BigQuery and other storage repositories for sensitive data using Cloud DLP, see the following topics in the Cloud DLP documentation:

If you want to redact or otherwise de-identify the sensitive data that the Cloud DLP scan found, see:

Additional resources