Profile data in an organization or folder

This page describes how to configure profiling at the level of an organization or folder. If you want to profile a project, see Profile data in a single project.

For more information about data profiles, see Data profiles for BigQuery data.

To start profiling data, you create a scan configuration.

Before you begin

  1. Confirm that you have the IAM permissions that are required to configure data profiles at the organization level.

    If you don't have the Organization Administrator (roles/resourcemanager.organizationAdmin) or Security Admin (roles/iam.securityAdmin) role, you can still create a scan configuration. However, after you create the scan configuration, someone with either of those roles must grant data profiling access to your service agent.

  2. You can configure Cloud DLP to send notifications to Pub/Sub when certain events occur, such as when Cloud DLP profiles a new table. If you want to use this feature, you must first create a Pub/Sub topic.

Create a scan configuration

To create a scan configuration, perform the steps in the following sections. At the end of each section, click Continue.

  1. Go to the Create scan configuration page.

    Go to Create scan configuration

  2. If needed, go to your organization. On the toolbar, click the project selector and select your organization.

    Screenshot of the project selector on the toolbar

The following sections provide more information about the steps in the Create scan configuration page.

Select resource to scan

Do one of the following:

  • To configure profiling at the organization level, select Scan entire organization.
  • To configure profiling at the level of a folder, select Scan selected folder. Then, click Browse and select the folder.

Manage schedules

If the default profiling frequency suits your needs, you can skip this section of the Create scan configuration page. This section is useful if you want to make fine-grained adjustments to the profiling frequency of all your data or certain subsets of your data. It's also useful if you don't want certain tables to ever be profiled, or you want them to be profiled once and then never again.

In this section, you create filters to specify certain subsets of your data that are of interest. For these subsets, you define whether Cloud DLP should profile the tables, and how often. Here, you also specify the types of changes that should cause a table to be reprofiled. Finally, you specify any conditions that each table in the subsets must meet before Cloud DLP starts profiling the table.

To make fine-grained adjustments to profiling frequency, follow these steps:

  1. Click Add schedule.
  2. In the Filters section, you define one or more filters that specify which tables are in the schedule's scope.

    Specify at least one of the following:

    • A project ID or a regular expression that specifies one or more projects.
    • A dataset ID or a regular expression that specifies one or more datasets.
    • A table ID or a regular expression that specifies one or more tables.

    Regular expressions must follow the RE2 syntax.

    For example, if you want all tables in a project to be included in the filter, specify that project's ID and leave the two other fields blank.

    If you want to add more filters, click Add filter and repeat this step.

  3. Click Frequency.

  4. In Frequency section, specify whether Cloud DLP should profile the tables you defined in your filters, and if so, how often:

    • If you never want the tables to be profiled, turn off Profile the tables.

    • If you want the tables to be profiled at least once, leave Profile the tables on, and follow these steps:

      1. In the When schema changes field, specify when you want the tables to be reprofiled if they undergo schema changes after they were last profiled.

        • Do not reprofile: Never reprofile after the initial profiles are generated.
        • Reprofile daily: Reprofile once every 24 hours.
        • Reprofile monthly: Reprofile once every 30 days.
      2. For Types of schema change, specify which types of schema change should trigger a reprofile operation:

        • New columns: Reprofile the tables that gained new columns.
        • Removed columns: Reprofile the tables that had columns removed.

        Suppose you want reprofile operations to run every 24 hours. Also, you want to reprofile only the tables that gained new columns after they were last profiled. In this case, set When schema changes to Reprofile daily, and set Types of schema change to New columns.

      3. In the When table changes field, specify when you want the tables to be reprofiled if they undergo any change after they were last profiled. Examples of table changes are row deletions and schema changes.

        • Do not reprofile: Never reprofile after the initial profiles are generated.
        • Reprofile daily: Reprofile once every 24 hours.
        • Reprofile monthly: Reprofile once every 30 days.

      You must select a value that is the same as, or less frequent than, the value you set in the When schema changes field.

  5. Click Conditions.

  6. In Conditions section, specify any conditions that the tables, defined in your filters, must meet before Cloud DLP profiles them. If you set minimum conditions and the time condition, Cloud DLP only profiles tables that meet both types of conditions.

    • Minimum conditions: These conditions are useful if you want to delay profiling of a table until it has enough rows or until it reaches a certain age. Turn on the conditions you want to apply, and specify the minimum row count or duration.
    • Time condition: This condition is useful if you don't want old tables to ever be profiled. Turn on the time condition, and pick a date and time. Any table created on or before that date is excluded from profiling.

    Suppose you have the following configuration:

    • Minimum conditions

      • Minimum row count: 10 rows
      • Minimum duration: 24 hours
    • Time condition

      • Timestamp: 5/4/22, 11:59 PM

    In this case, Cloud DLP excludes any tables created on or before May 4, 2022, 11:59 PM. Among the tables created after this date and time, Cloud DLP profiles only the tables that either have 10 rows or are at least 24 hours old.

  7. Click Done.

  8. If you want to add more schedules, click Add schedule and repeat the previous steps.

  9. To reorder the schedules according to priority, use the up and down arrows. For example, if the filters in two different schedules match Table A, the schedule higher on the priority list takes precedence.

    The last schedule in the list is always the one labeled Default schedule. This default schedule covers the tables in your selected resource (organization or folder) that don't match any of the schedules you created. This default schedule follows the system default profiling frequency.

  10. If you want to adjust the default schedule, click Edit schedule, and adjust the settings as needed.

Select inspection template

Depending on how you want to provide an inspection configuration, choose one of the following options. Regardless of which option you choose, Cloud DLP scans your data in the region where you configured BigQuery to store that data. Your BigQuery data doesn't leave its region of origin.

Option 1: Create an inspection template

Choose this option if you want to create a new inspection template in the global region.

  1. Click Create new inspection template.
  2. Optional: To modify the default selection of infoTypes, click Manage infoTypes. Use the filter to find and select infoTypes. Then, click Done.

  3. Optional: Configure the inspection template further by adding rulesets and setting a confidence threshold. For more information, see Configure detection.

    When Cloud DLP creates the scan configuration, it stores this new inspection template in the global region.

Option 2: Use an existing inspection template

Choose this option if you have existing inspection templates that you want to use.

  1. Click Select existing inspection template.

  2. Enter the full resource name of the inspection template that you want to use. The Region field is autopopulated with the name of the region where your inspection template is stored.

    The inspection template you enter must be in the same region as the data to be profiled. To respect data residency, Cloud DLP doesn't use an inspection template outside its own region.

    To find the full resource name of an inspection template, follow these steps:

    1. Go to your inspection templates list. This page opens on a separate tab.

      Go to inspection templates

    2. If needed, switch to the project that contains the inspection template that you want to use.

    3. On the Templates tab, click the template ID of the template that you want to use.

    4. On the page that opens, copy the full resource name of the template. The full resource name follows this format:

      projects/PROJECT_ID/locations/REGION/inspectTemplates/TEMPLATE_ID
    5. On the Create scan configuration page, in the Template name field, paste the full resource name of the template.

  3. If you have data in another region, and you have an inspection template that you want to use for that region, follow these steps:

    1. Click Add inspection template.
    2. Enter the inspection template's full resource name.

    Repeat these steps for each region where you have a dedicated inspection template.

  4. Optional: Add an inspection template that's stored in the global region. Cloud DLP automatically uses that template for data in regions where you don't have a dedicated inspection template.

Manage scan outcome

In the following sections, you specify actions that you want Cloud DLP to take after it generates the data profiles.

Publish to Chronicle

Metrics gathered from data profiles can add context to your Chronicle findings. The added context can help you determine the most important security issues to address. For example, if you're investigating a particular service agent in Chronicle, data profiles can provide insight into whether that service agent has access to tables that have high data risk levels.

To send your data profiles to your Chronicle account, turn on Publish to Chronicle.

If Chronicle isn't enabled for your organization, turning on this option has no effect.

Save data profile copies to BigQuery

Turning on Save data profile copies to BigQuery lets you keep a saved copy or history of all of your generated profiles. Doing so can be useful for creating audit reports and visualizing data profiles. You can also load this information into other systems.

Also, this option lets you see all of your data profiles in a single view, regardless of which region your data resides in. If you turn off this option, you can still view the data profiles in your dashboard. However, in your dashboard, you select one region at a time, and see only the data profiles for that region.

To export copies of the data profiles to a BigQuery table, follow these steps:

  1. Turn on Save data profile copies to BigQuery.

  2. Enter the details of the BigQuery table where you want to save the data profiles:

    • For Project ID, enter the ID of an existing project where you want data profiles to be exported to.

    • For Dataset ID, enter the name of an existing dataset in the project where you want data profiles to be exported to.

    • For Table ID, enter a name for the BigQuery table where data profiles will be exported to. If you haven't created this table, Cloud DLP automatically creates it for you using the name you provide.

Cloud DLP starts exporting profiles from the time you turn on this option. Profiles that were generated before you turned on exporting aren't saved to BigQuery.

Publish to Pub/Sub

Turning on Publish to Pub/Sub lets you take programmatic actions based on profiling results. You can use Pub/Sub notifications to develop a workflow for catching and remediating findings with significant data risk or sensitivity.

To send notifications to a Pub/Sub topic, follow these steps:

  1. Turn on Publish to Pub/Sub.

    A list of options appears. Each option describes an event that causes Cloud DLP to send a notification to Pub/Sub.

  2. Select the events that should trigger a Pub/Sub notification.

    If you select Send a Pub/Sub notification each time a profile is updated, Cloud DLP sends a notification when there's a change in the following table-level metrics:

    • Data risk
    • Sensitivity
    • Predicted infoTypes
    • Other infoTypes
    • Public
    • Encryption
  3. For each event you select, follow these steps:

    1. Enter the name of the topic. The name must be in the following format:

      projects/PROJECT_ID/topics/TOPIC_ID
      

      Replace the following:

      • PROJECT_ID: the ID of the project associated with the Pub/Sub topic.
      • TOPIC_ID: the ID of the Pub/Sub topic.
    2. Specify whether to include the full table profile in the notification, or just the full resource name of the table that was profiled.

    3. Set the minimum data risk and sensitivity levels that must be met for Cloud DLP to send a notification.

    4. Specify whether only one or both of the data risk and sensitivity conditions must be met. For example, if you choose AND, then both the data risk and the sensitivity conditions must be met before Cloud DLP sends a notification.

Manage service agent container and billing

In this section, you specify the project to use as a service agent container. You can have Cloud DLP automatically create a new project, or you can choose an existing project.

  • If you're creating a scan configuration for the first time, click Create a new project as a service agent container.

    Cloud DLP creates a new project named DLP Service Agent Container. This project is effectively a regular Google Cloud project that contains a new service agent. Cloud DLP prompts you to select the account to bill for all billable operations related to this project, including operations that aren't related to data profiling.

  • If you have an existing service agent container that you want to reuse, click Select an existing service agent container. Then, click Browse to select the service agent container's project ID.

Regardless of whether you're using a newly created service agent or reusing an existing one, make sure it has read access to the data to be profiled. If you're exporting profiles to BigQuery, make sure it also has write access to the output table.

Set location to store configuration

Click the Resource location list, and select the region where you want to store this scan configuration. All scan configurations that you later create will also be stored in this location.

Where you choose to store your scan configuration doesn't affect the data to be scanned. Also, it doesn't affect where the data profiles are stored. Your data is scanned in the same region where that data is stored (as set in BigQuery). For more information, see Data residency considerations.

Review and create

  1. If you don't want profiling to begin shortly after you create the scan configuration, select Create scan in paused mode.

    This option is useful in the following cases:

    • Your Google Cloud admin still needs to grant data profiling access to the service agent.
    • You want to create multiple scan configurations and you want some configurations to override others.
    • You opted to save data profiles to BigQuery, and you want to make sure the service agent has write access to your output table.
    • You configured Pub/Sub notifications and you want to grant publishing access to the service agent.
  2. Review your settings and click Create.

    Cloud DLP creates the scan configuration and adds it to the Configurations list.

To view or manage your scan configurations, go to the data profile configurations list.

Go to data profile configurations

If your service agent has the roles needed to access and profile your data, then Cloud DLP starts scanning your data shortly after you create the scan configuration. Otherwise, Cloud DLP shows an error when you view the scan configuration details.

What's next