Sending Cloud DLP scan results to Data Catalog

This guide walks you through using Cloud Data Loss Prevention (DLP) to scan specific Google Cloud resources and send results to Data Catalog.

Data Catalog is a scalable metadata management service that empowers you to quickly discover, manage, and understand all your data in Google Cloud.

Cloud DLP integrates natively with Data Catalog. When you use a Cloud DLP action to scan your BigQuery tables for sensitive data, it can send results directly to Data Catalog in the form of a tag template.

By completing the steps in this guide, you'll do the following:

  • Enable Data Catalog and Cloud DLP.
  • Set up Cloud DLP to scan a BigQuery table.
  • Configure a Cloud DLP scan to send scan results to Data Catalog.

For more information about Data Catalog, see the Data Catalog documentation.

Costs

When you follow the instructions in this topic, you use billable components of Google Cloud, including:

  • Cloud DLP
  • BigQuery

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

New Google Cloud users might be eligible for a free trial.

Before you begin

Before you can send Cloud DLP scan results to Data Catalog, do the following:

  • Step 1: Set up billing.
  • Step 2: Create a new project and populate a new BigQuery table. (Optional.)
  • Step 3: Enable Data Catalog.
  • Step 4: Enable Cloud DLP.

The following subsections cover each step in detail.

Step 1: Set up billing

You must first set up a billing account if you don't already have one.

Learn how to enable billing

Step 2: Create a new project and populate a new BigQuery table (Optional)

If you are setting up this feature for production work or already have a BigQuery table that you want to scan, open the Google Cloud project that contains the table and skip to Step 3.

If you are trying out this feature and want to scan a "dummy" or test set of data, create a new project. To complete this step, you must have the IAM Project Creator role. Learn more about IAM roles.

  1. Go to the New Project page in the Google Cloud Console.

    New Project

  2. On the Billing account drop-down list, select the billing account that the project should be billed to.
  3. On the Organization drop-down list, select the organization that you want to create the project in.
  4. On the Location drop-down list, select the organization or folder that you want to create the project in.
  5. Click Create to create the project.

Next, download and store the sample data:

  1. Go to the Cloud Functions tutorials repository on GitHub.
  2. Select one of the CSV files that has example data, and then download the file.
  3. Next, go to BigQuery in the Cloud Console.
  4. Select your project.
  5. Click Create Dataset.
  6. Click Create Table.
  7. Click Upload, and then select the file you want to upload.
  8. Give the table a name, and then click Create Table.

Step 3: Enable Data Catalog

Next, enable Data Catalog for the project that contains the BigQuery table you want to scan using Cloud DLP.

To enable Data Catalog using the Cloud Console:

  1. Register your application for Data Catalog.

    Register your application for Data Catalog

  2. On the registration page, from the Create a project drop-down list, select the project you want to use with Data Catalog. If you only want to test out this feature, choose Create a project at the bottom of the menu to create a new project.
  3. After you've selected the project, click Continue.

Data Catalog is now enabled for your project.

Step 4: Enable Cloud DLP

Enable Cloud DLP for the same project you enabled Data Catalog for.

To enable Cloud DLP using the Cloud Console:

  1. Register your application for Cloud DLP.

    Register your application for Cloud DLP

  2. On the registration page, from the Create a project drop-down list, select the same project you chose in the previous step.
  3. After you've selected the project, click Continue.

Cloud DLP is now enabled for your project.

Configure and run a Cloud DLP inspection scan

You can configure and run a Cloud DLP inspection scan using either the Cloud Console or the DLP API.

Cloud Console

To set up a scan job of a BigQuery table using Cloud DLP:

  1. In the Cloud Console, open Cloud DLP.

    Go to Cloud DLP

  2. From the Create menu, choose Job or job trigger.

    Screenshot of Create new job or job trigger menu choice.

  3. Enter the Cloud DLP job information and click Continue to complete each step:

    • For Step 1: Choose input data, name the job by entering a value in the Name field. In Location, choose BigQuery from the Storage type menu, and then enter the information for the table to scan. The Sampling section is pre-configured to run a sample scan against your data. You can adjust the Limit rows by and Maximum number of rows fields to save resources if you have a large amount of data. For more details, see Choose input data.

    • (Optional) In Step 2: Configure detection, you configure what types of data to look for, called "infoTypes." For the purposes of this walkthrough, keep the default infoTypes selected. For more details, see Configure detection.

    • For Step 3: Add actions, enable Save to Data Catalog.

    • (Optional) For Step 4: Schedule, for the purposes of this walkthrough, leave the menu set to None so that the scan runs just once. To learn more about scheduling repeating scans, see Schedule.

  4. Click Create. The job runs immediately.

DLP API

In this section, you configure and run a Cloud DLP scan job.

The inspection job that you configure here instructs Cloud DLP to scan either the sample BigQuery data described in Step 2 above or your own BigQuery data. The job configuration that you specify is also where you instruct Cloud DLP to save its scan results to Data Catalog.

Step 1: Note your project identifier

  1. Go to the Cloud Console.

    Go to the Cloud Console

  2. Click Select.

  3. On the Select from drop-down list, select the organization for which you enabled Data Catalog.

  4. Under ID, copy the project ID for the project that contains the data you want to scan. This is the project described in the set storage repositories step earlier on this page.

  5. Under Name, click the project to select it.

Step 2: Open APIs Explorer and configure the job

  1. Go to APIs Explorer on the reference page for the dlpJobs.create method. To keep these instructions available, right-click the following link and open it in a new tab or window:

    Open APIs Explorer

  2. In the parent box, enter the following, where project-id is the project ID you noted earlier in the previous step:

    projects/project-id

    Next, copy the following JSON. Select the contents of the Request body field in APIs Explorer, and then paste the JSON to replace the contents. Be sure to replace the project-id, bigquery-dataset-name, and bigquery-table-name placeholders with the actual project ID and BigQuery dataset and table names, repectively.

    {
      "inspectJob":
      {
        "storageConfig":
        {
          "bigQueryOptions":
          {
            "tableReference":
            {
              "projectId": "project-id",
              "datasetId": "bigquery-dataset-name",
              "tableId": "bigquery-table-name"
            }
          }
        },
        "inspectConfig":
        {
          "infoTypes":
          [
            {
              "name": "EMAIL_ADDRESS"
            },
            {
              "name": "PERSON_NAME"
            },
            {
              "name": "US_SOCIAL_SECURITY_NUMBER"
            },
            {
              "name": "PHONE_NUMBER"
            }
          ],
          "includeQuote": true,
          "minLikelihood": "UNLIKELY",
          "limits":
          {
            "maxFindingsPerRequest": 100
          }
        },
        "actions":
        [
          {
            "publishFindingsToCloudDataCatalog": {}
          }
        ]
      }
    }
    

To learn more about the available scan options, see Inspecting storage and databases for sensitive data. For a full list of information types that Cloud DLP can scan for and detect, see InfoTypes reference.

Step 3: Execute the request to start the scan job

After you configure the job by following the preceding steps, click Execute to send the request. If the request is successful, a response appears with a success code and a JSON object that indicates the status of the Cloud DLP job you just created.

The response to your scan request includes the job ID of your inspection scan job as the "name" key, and the current state of the inspection scan job as the "state" key. Because you just submitted the request, the job's state at that moment is "PENDING".

Check the status of the Cloud DLP inspection scan

After you submit the scan request, the scan of your content begins immediately.

Cloud Console

To check the status of the inspection scan job:

  1. In the Cloud Console, open Cloud DLP.

    Go to Cloud DLP

  2. Click the Jobs & job triggers tab, and then click All jobs.

The job you just ran will likely be at the top of the list. Check the State column to be sure its status is Done.

You can click on the Job ID of the job to see its results. Each infoType detector listed on the Job details page is followed by the number of matches that were found in the content.

DLP API

To check the status of the inspection scan job:

  1. Go to APIs Explorer on the reference page for the dlpJobs.get method by clicking the following button:

    Open APIs Explorer

  2. In the name box, type the name of the job from the JSON response to the scan request in the following form:

    projects/project-id/dlpJobs/job-id
    The job ID is in the form of i-1234567890123456789.

  3. To submit the request, click Execute.

If the response JSON object's "state" key indicates that the job is "DONE", then the scan job has finished.

To view the rest of the response JSON, scroll down the page. Under "result" > "infoTypeStats", each information type listed should have a corresponding "count". If not, make sure that you entered the JSON accurately, and that the path or location to your data is correct.

After the scan job is done, you can continue to the next section of this guide to view scan results in Security Command Center.

View Cloud DLP scan results in Data Catalog

Because you instructed Cloud DLP to send its inspection scan job results to Data Catalog, you can now view the automatically created tags and tag template in the Data Catalog UI:

  1. Go to the Data Catalog page in the Cloud Console.

    Go to Data Catalog

  2. Search for the table that you inspected.
  3. Click on the results that match your table to view the table's metadata.

The following screen shot shows the Data Catalog metadata view of an example table:

DLP detail in Data Catalog..

Cloud DLP Data Discovery

Findings from Cloud DLP are included in summary form for the table that you scanned. This summary includes total infoType counts, as well as summary data about the inspection job that includes dates and job resource ID.

Any infoTypes that were inspected for are listed. Those with findings show a count greater than zero.

Cleaning up

To avoid incurring charges to your Google Cloud account for the resources used in this topic, do one of the following, depending on whether you used sample data or your own data:

Deleting the project

The easiest way to eliminate billing is to delete the project you created while following the instructions provided in this topic.

To delete the project:

  1. In the Cloud Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
    Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

When you delete your project using this method, the Cloud DLP job and Cloud Storage bucket you created are also deleted, and you're done. It's not necessary to follow the instructions in the following sections.

Deleting the Cloud DLP job or job trigger

If you scanned your own data, delete the inspect scan job or job trigger you just created.

Cloud Console

  1. In the Cloud Console, open Cloud DLP.

    Go to Cloud DLP

  2. Click the Jobs & job triggers tab, and then click the Job triggers tab.

  3. In the Actions column for the job trigger you want to delete, click the more actions menu (displayed as three dots arranged vertically) , and then click Delete.

Optionally, you can also delete the job details for the job that you ran. Click the All jobs tab, and then in the Actions column for the job you want to delete, click the more actions menu (displayed as three dots arranged vertically) , and then Delete.

DLP API

  1. Go to APIs Explorer on the reference page for the dlpJobs.delete method by clicking the following button:

    Open APIs Explorer

  2. In the name box, type the name of the job from the JSON response to the scan request, which has the following form:

    projects/project-id/dlpJobs/job-id
    The job ID is in the form of i-1234567890123456789.

If you created additional scan jobs or if you want to make sure you've deleted the job successfully, you can list all of the existing jobs:

  1. Go to APIs Explorer on the reference page for the dlpJobs.list method by clicking the following button:

    Open APIs Explorer

  2. In the parent box, type the project identifier in the following form, where project-id is your project identifier:

    projects/project-id

  3. Click Execute.

If there are no jobs listed in the response, you've deleted all of the jobs. If jobs are listed in the response, repeat the deletion procedure above for those jobs.

What's next