Profile Cloud SQL data in a single project

This page describes how to configure Cloud SQL data discovery at the project level. If you want to profile an organization or folder, see Profile Cloud SQL data in an organization or folder.

For more information about the discovery service, see Data profiles.

How it works

The following is a high-level workflow for profiling Cloud SQL data:

  1. Create a scan configuration.

    After you create a scan configuration, Sensitive Data Protection starts identifying your Cloud SQL instances and creating a default connection for each instance. Depending on the number of instances in scope of discovery, this process can take a few hours. You can exit the Google Cloud console and check your connections later.

  2. Grant the required IAM roles to the service agent associated with your scan configuration.

  3. When the default connections are ready, give Sensitive Data Protection access to your Cloud SQL instances by updating each connection with the proper database user credentials. You can provide existing database user accounts or create database users.

  4. Recommended: Increase the maximum number of connections that Sensitive Data Protection can use to profile your data. Increasing the connections can speed up discovery.

Supported services

This feature supports the following:

  • Cloud SQL for MySQL
  • Cloud SQL for PostgreSQL

Cloud SQL for SQL Server isn't supported.

Processing and storage regions

Sensitive Data Protection is a regional and multi-regional service; it doesn't distinguish between zones. When Sensitive Data Protection profiles a Cloud SQL instance, the data is processed in its current region, but not necessarily its current zone. For example, if a Cloud SQL instance is stored in the us-central1-a zone, then Sensitive Data Protection processes and stores the data profiles in the us-central1 region.

For more information, see Data residency considerations.

Before you begin

  1. Make sure the Cloud Data Loss Prevention API is enabled on your project:

    1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
    2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Go to project selector

    3. Make sure that billing is enabled for your Google Cloud project.

    4. Enable the required API.

      Enable the API

    5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Go to project selector

    6. Make sure that billing is enabled for your Google Cloud project.

    7. Enable the required API.

      Enable the API

  2. Confirm that you have the IAM permissions that are required to configure data profiles at the project level.

  3. You must have an inspection template in each region where you have data to be profiled. If you want to use a single template for multiple regions, you can use a template that is stored in the global region. If organizational policies prevent you from creating a global inspection template, you must set a dedicated inspection template for each region. For more information, see Data residency considerations.

    This task lets you create an inspection template in the global region only. If you need dedicated inspection templates for one or more regions, you must create those templates before performing this task.

  4. You can configure Sensitive Data Protection to send notifications to Pub/Sub when certain events occur, such as when Sensitive Data Protection profiles a new table. If you want to use this feature, you must first create a Pub/Sub topic.

Create a scan configuration

  1. Go to the Create scan configuration page.

    Go to Create scan configuration

  2. Go to your project. On the toolbar, click the project selector and select your project.

The following sections provide more information about the steps in the Create scan configuration page. At the end of each section, click Continue.

Select a discovery type

Select Cloud SQL.

Select scope

Do one of the following:

  • If you want to scan a single table in test mode, select Scan one table (test mode).

    The number of free table scans available is displayed. Free table scans apply only to tables that are less than or equal to 1 TB in size. For each table, you can have only one table-level scan configuration. For more information, see Profile a table in test mode.

    Fill in the details of the table that you want to profile.

  • If you want to perform standard project-level profiling, select Scan entire project.

Manage schedules

If the default profiling frequency suits your needs, you can skip this section of the Create scan configuration page. This section is useful if you want to make fine-grained adjustments to the profiling frequency of all your data or certain subsets of your data. It's also useful if you don't want certain tables to ever be profiled, or you want them to be profiled once and then never again.

In this section, you create filters to specify certain subsets of your data that are of interest. For these subsets, you define whether Sensitive Data Protection should profile the tables and how often. Here, you also specify the types of changes that should cause a table to be reprofiled. Finally, you specify any conditions that each table in the subsets must meet before Sensitive Data Protection starts profiling the table.

To make fine-grained adjustments to profiling frequency, follow these steps:

  1. Click Add schedule.
  2. In the Filters section, you define one or more filters that specify which tables are in the schedule's scope.

    Specify at least one of the following:

    • A project ID or a regular expression that specifies one or more projects.
    • An instance ID or a regular expression that specifies one or more instances.
    • A database ID or a regular expression that specifies one or more databases.
    • A table ID or a regular expression that specifies one or more tables. Enter this value in the Database resource name or regular expression field.

    Regular expressions must follow the RE2 syntax.

    For example, if you want all tables in a database to be included in the filter, enter the database ID in the Database ID field.

    If you want to add more filters, click Add filter and repeat this step.

  3. Click Frequency.

  4. In the Frequency section, specify whether the discovery service should profile the tables you selected, and if so, how often:

    • If you never want the tables to be profiled, turn off Do profile this data.

    • If you want the tables to be profiled at least once, leave Do profile this data on.

      In the succeeding fields in this section, you specify whether the system should reprofile your data and what events should trigger a reprofile operation. For more information, see Frequency of data profile generation.

      1. For On a schedule, specify how often you want the the tables to be reprofiled. The tables are reprofiled regardless of whether they underwent any changes.
      2. For When schema changes, specify how often Sensitive Data Protection should check if the selected tables had schema changes after they were last profiled. Only tables with schema changes will be reprofiled.
      3. For Types of schema change, specify which types of schema changes should trigger a reprofile operation. Select one of the following:
        • New columns: Reprofile the tables that gained new columns.
        • Removed columns: Reprofile the tables that had columns removed.

        For example, suppose you have tables that gain new columns every day, and you need to profile their contents each time. You can set When schema changes to Reprofile daily, and set Types of schema change to New columns.

      4. For When inspect template changes, specify whether you want your data to be reprofiled when the associated inspection template is updated, and if so, how often.

        An inspection template change is detected when either of the following occurs:

        • The name of an inspection template changes in your scan configuration.
        • The updateTime of an inspection template changes.

      5. For example, if you set an inspection template for the us-west1 region and you update that inspection template, then only data in the us-west1 region will be reprofiled. However, if you delete that inspection template instead, then the data in us-west1 isn't reprofiled, because there's no inspection template to use to reprofile it.

  5. Click Conditions.

    In the Conditions section, you specify the types of database resources that you want to profile. By default, Sensitive Data Protection is set to profile all supported database resource types. When Sensitive Data Protection adds support for more database resource types, those types will automatically be profiled, too.

  6. Optional: If you want to explicitly set the database resource types that you want to profile, follow these steps:

    1. Click the Database resource types field.
    2. Select the database resource types that you want to profile.

    If Sensitive Data Protection later adds discovery support for more Cloud SQL database resource types, those types will only be profiled if you return to this list and select them.

  7. Click Done.

  8. If you want to add more schedules, click Add schedule and repeat the previous steps.

  9. To reorder the schedules according to priority, use the up and down arrows. For example, if the filters in two different schedules match Table A, the schedule higher on the priority list takes precedence.

    The last schedule in the list is always the one labeled Default schedule. This default schedule covers the tables in your project that don't match any of the schedules you created. This default schedule follows the system default profiling frequency.

  10. If you want to adjust the default schedule, click Edit schedule, and adjust the settings as needed.

Select inspection template

Depending on how you want to provide an inspection configuration, choose one of the following options. Regardless of which option you choose, Sensitive Data Protection scans your data in the region where that data is stored. That is, your data doesn't leave its region of origin.

Option 1: Create an inspection template

Choose this option if you want to create a new inspection template in the global region.

  1. Click Create new inspection template.
  2. Optional: To modify the default selection of infoTypes, click Manage infoTypes.

    For more information about how to manage built-in and custom infoTypes in this section, see Manage infoTypes through the Google Cloud console.

    You must have at least one infoType selected to continue.

  3. Optional: Configure the inspection template further by adding rulesets and setting a confidence threshold. For more information, see Configure detection.

    When Sensitive Data Protection creates the scan configuration, it stores this new inspection template in the global region.

Option 2: Use an existing inspection template

Choose this option if you have existing inspection templates that you want to use.

  1. Click Select existing inspection template.

  2. Enter the full resource name of the inspection template that you want to use. The Region field is autopopulated with the name of the region where your inspection template is stored.

    The inspection template you enter must be in the same region as the data to be profiled. To respect data residency, Sensitive Data Protection doesn't use an inspection template outside its own region.

    To find the full resource name of an inspection template, follow these steps:

    1. Go to your inspection templates list. This page opens on a separate tab.

      Go to inspection templates

    2. Switch to the project that contains the inspection template that you want to use.

    3. On the Templates tab, click the template ID of the template that you want to use.

    4. On the page that opens, copy the full resource name of the template. The full resource name follows this format:

      projects/PROJECT_ID/locations/REGION/inspectTemplates/TEMPLATE_ID
    5. On the Create scan configuration page, in the Template name field, paste the full resource name of the template.

  3. If you have data in another region, and you have an inspection template that you want to use for that region, follow these steps:

    1. Click Add inspection template.
    2. Enter the inspection template's full resource name.

    Repeat these steps for each region where you have a dedicated inspection template.

  4. Optional: Add an inspection template that's stored in the global region. Sensitive Data Protection automatically uses that template for data in regions where you don't have a dedicated inspection template.

Add actions

In the following sections, you specify actions that you want Sensitive Data Protection to take after it generates the data profiles.

For information about how other Google Cloud services may charge you for configuring actions, see Pricing for exporting data profiles.

Publish to Security Command Center

This action lets you send the calculated data risk and sensitivity levels of table data profiles to Security Command Center.

Security Command Center is Google Cloud's centralized vulnerability and threat reporting service. You can use insights from data profiles when you triage and develop response plans for your vulnerability and threat findings in Security Command Center.

Before you can use this action, Security Command Center must be activated at the organization level. Turning on Security Command Center at the organization level enables the flow of findings from integrated services like Sensitive Data Protection. Sensitive Data Protection works with Security Command Center in all service tiers.

If Security Command Center isn't activated at the organization level, Sensitive Data Protection findings won't appear in Security Command Center. For more information, see Check the activation level of Security Command Center.

To send the results of your data profiles to Security Command Center, make sure the Publish to Security Command Center option is turned on.

For more information, see Publish data profiles to Security Command Center.

Save data profile copies to BigQuery

Turning on Save data profile copies to BigQuery lets you keep a saved copy or history of all of your generated profiles. Doing so can be useful for creating audit reports and visualizing data profiles. You can also load this information into other systems.

Also, this option lets you see all of your data profiles in a single view, regardless of which region your data resides in. If you turn off this option, you can still view the data profiles in the Google Cloud console. However, in the Google Cloud console, you select one region at a time, and see only the data profiles for that region.

To export copies of the data profiles to a BigQuery table, follow these steps:

  1. Turn on Save data profile copies to BigQuery.

  2. Enter the details of the BigQuery table where you want to save the data profiles:

    • For Project ID, enter the ID of an existing project where you want data profiles to be exported to.

    • For Dataset ID, enter the name of an existing dataset in the project where you want data profiles to be exported to.

    • For Table ID, enter a name for the BigQuery table where data profiles will be exported to. If you haven't created this table, Sensitive Data Protection automatically creates it for you using the name you provide.

Sensitive Data Protection starts exporting profiles from the time you turn on this option. Profiles that were generated before you turned on exporting aren't saved to BigQuery.

Publish to Pub/Sub

Turning on Publish to Pub/Sub lets you take programmatic actions based on profiling results. You can use Pub/Sub notifications to develop a workflow for catching and remediating findings with significant data risk or sensitivity.

To send notifications to a Pub/Sub topic, follow these steps:

  1. Turn on Publish to Pub/Sub.

    A list of options appears. Each option describes an event that causes Sensitive Data Protection to send a notification to Pub/Sub.

  2. Select the events that should trigger a Pub/Sub notification.

    If you select Send a Pub/Sub notification each time a profile is updated, Sensitive Data Protection sends a notification when there's a change in the following table-level metrics:

    • Data risk
    • Sensitivity
    • Predicted infoTypes
    • Other infoTypes
    • Public
    • Encryption
  3. For each event you select, follow these steps:

    1. Enter the name of the topic. The name must be in the following format:

      projects/PROJECT_ID/topics/TOPIC_ID
      

      Replace the following:

      • PROJECT_ID: the ID of the project associated with the Pub/Sub topic.
      • TOPIC_ID: the ID of the Pub/Sub topic.
    2. Specify whether to include the full table profile in the notification, or just the full resource name of the table that was profiled.

    3. Set the minimum data risk and sensitivity levels that must be met for Sensitive Data Protection to send a notification.

    4. Specify whether only one or both of the data risk and sensitivity conditions must be met. For example, if you choose AND, then both the data risk and the sensitivity conditions must be met before Sensitive Data Protection sends a notification.

Set location to store configuration

Click the Resource location list, and select the region where you want to store this scan configuration. All scan configurations that you later create will also be stored in this location.

Where you choose to store your scan configuration doesn't affect the data to be scanned. Also, it doesn't affect where the data profiles are stored. Your data is scanned in the same region where that data is stored. For more information, see Data residency considerations.

Review and create

  1. If you want to make sure that profiling doesn't start automatically after you create the scan configuration, select Create scan in paused mode.

    This option is useful in the following cases:

    • You opted to save data profiles to BigQuery, and you want to make sure the service agent has write access to your output table.
    • You configured Pub/Sub notifications and you want to grant publishing access to the service agent.
  2. Review your settings and click Create.

    Sensitive Data Protection creates the scan configuration and adds it to the discovery scan configurations list.

To view or manage your scan configurations, see Manage scan configurations.

Sensitive Data Protection starts identifying your Cloud SQL instances and creating a default connection for each instance. Depending on the number of instances in scope of discovery, this process can take a few hours. You can exit the Google Cloud console and check your connections later.

When the default connections are ready, update those connections with the database user credentials that you want Sensitive Data Protection to use to profile your Cloud SQL instances. For more information, see Manage connections for use with discovery.

What's next

Learn how to update your connections.