This page describes how to configure Cloud Storage data discovery at the project level. If you want to profile an organization or folder, see Profile Cloud Storage data in an organization or folder.
For more information about the discovery service, see Data profiles.
Before you begin
Make sure the Cloud Data Loss Prevention API is enabled on your project:
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the required API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the required API.
Confirm that you have the IAM permissions that are required to configure data profiles at the project level.
You must have an inspection template in each region where you have data to be profiled. If you want to use a single template for multiple regions, you can use a template that is stored in the
global
region. If organizational policies prevent you from creating an inspection template in theglobal
region, then you must set a dedicated inspection template for each region. For more information, see Data residency considerations.This task lets you create an inspection template in the
global
region only. If you need dedicated inspection templates for one or more regions, you must create those templates before performing this task.You can configure Sensitive Data Protection to send notifications to Pub/Sub when certain events occur, such as when Sensitive Data Protection profiles a new bucket. If you want to use this feature, you must first create a Pub/Sub topic.
You can configure Sensitive Data Protection to automatically attach tags to your resources. This feature lets you conditionally grant access to those resources based on their calculated sensitivity levels. If you want to use this feature, you must first complete the tasks in Control IAM access to resources based on data sensitivity.
Create a scan configuration
Go to the Create scan configuration page.
Go to your project. On the toolbar, click the project selector and select your project.
The following sections provide more information about the steps in the Create scan configuration page. At the end of each section, click Continue.
Select a discovery type
Select Cloud Storage.
Select scope
Do one of the following:If you want to scan a single bucket, select Scan one bucket.
For each bucket, you can have only one single-resource scan configuration. For more information, see Profile a single data resource.
Fill in the details of the bucket that you want to profile.
If you want to perform standard project-level profiling, select Scan selected project.
Manage schedules
If the default profiling frequency suits your needs, you can skip this section of the Create scan configuration page.
Configure this section for the following reasons:
- To make fine-grained adjustments to the profiling frequency of all your data or certain subsets of your data.
- To specify the buckets that you don't want to profile.
- To specify the buckets that you don't want profiled more than once.
To make fine-grained adjustments to profiling frequency, follow these steps:
- Click Add schedule.
In the Filters section, you define one or more filters that specify which buckets are in the schedule's scope.
Specify at least one of the following:
- A project ID or a regular expression that specifies one or more projects
- A bucket name or a regular expression that specifies one or more buckets
Regular expressions must follow the RE2 syntax.
For example, if you want all buckets in a project to be included in the filter, enter the project ID in the Project ID field.
If you want to add more filters, click Add filter and repeat this step.
Click Frequency.
In the Frequency section, specify whether the discovery service should profile the buckets that you selected, and if so, how often:
If you never want the buckets to be profiled, turn off Do profile this data.
If you want the buckets to be profiled at least once, leave Do profile this data on.
In the succeeding fields in this section, you specify whether the system should reprofile your data and what events should trigger a reprofile operation. For more information, see Frequency of data profile generation.
- For On a schedule, specify how often you want the the buckets to be reprofiled. The buckets are reprofiled regardless of whether they underwent any changes.
- For When inspect template changes, specify whether you want your data to be
reprofiled when the associated inspection template is updated, and if so, how
often.
An inspection template change is detected when either of the following occurs:
- The name of an inspection template changes in your scan configuration.
- The
updateTime
of an inspection template changes.
For example, if you set an inspection template for the
us-west1
region and you update that inspection template, then only data in theus-west1
region will be reprofiled.
Optional: Click Conditions.
In the Conditions section, you specify any conditions that the buckets—defined in your filters—must meet before Sensitive Data Protection profiles them.
If needed, set the following:
Minimum conditions: If you want to delay profiling of a bucket until it reaches a certain age, turn on this option. Then, enter the minimum duration.
Bucket attribute conditions: By default, Sensitive Data Protection doesn't scan buckets that have Autoclass enabled. If you want to scan those buckets, click Scan buckets with Autoclass enabled.
Object attribute conditions: By default, Sensitive Data Protection scans only objects that are in the Standard storage class. If you want to scan objects in other storage classes, select those storage classes individually or click Scan all objects regardless of the attribute.
Time condition: If you don't want old buckets to ever be profiled, turn on this option. Then, use the date picker to select a date and time. Any bucket created on or before your selected timestamp is excluded from profiling.
Example conditions
Suppose that you have the following configuration:
Minimum conditions
- Minimum duration: 24 hours
Bucket attribute conditions
- None selected
Object attribute conditions
- Scan objects with the Standard storage class
- Scan objects with the Nearline storage class
Time condition
- Timestamp: 05/4/22, 11:59 PM
In this case, Sensitive Data Protection excludes any bucket that was created on or before May 4, 2022, 11:59 PM. Among the buckets that were created after that date and time, Sensitive Data Protection profiles only the buckets that are at least 24 hours old and have Autoclass disabled. Within those buckets, Sensitive Data Protection profiles only the objects that are in the Standard and Nearline storage classes.
Click Done.
If you want to add more schedules, click Add schedule and repeat the previous steps.
To specify precedence between schedules, reorder them using the
up and down arrows.The order of the schedules specifies how conflicts between schedules are resolved. If a bucket matches the filters of two different schedules, the schedule higher in the schedules list dictates the profiling frequency for that bucket.
The last schedule in the list is always the one labeled Default schedule. This default schedule covers the buckets in your selected scope that don't match any of the schedules that you created. This default schedule follows the system default profiling frequency.
If you want to adjust the default schedule, click
Edit schedule, and adjust the settings as needed.Select inspection template
Depending on how you want to provide an inspection configuration, choose one of the following options. Regardless of which option you choose, Sensitive Data Protection scans your data in the region where that data is stored. That is, your data doesn't leave its region of origin.
Option 1: Create an inspection template
Choose this option if you want to create a new inspection template in the
global
region.
- Click Create new inspection template.
Optional: To modify the default selection of infoTypes, click Manage infoTypes.
For more information about how to manage built-in and custom infoTypes, see Manage infoTypes through the Google Cloud console.
You must have at least one infoType selected to continue.
Optional: Configure the inspection template further by adding rulesets and setting a confidence threshold. For more information, see Configure detection.
When Sensitive Data Protection creates the scan configuration, it stores this
new inspection template in the global
region.
Option 2: Use an existing inspection template
Choose this option if you have existing inspection templates that you want to use.
Click Select existing inspection template.
Enter the full resource name of the inspection template that you want to use. The Region field is automatically populated with the name of the region where your inspection template is stored.
The inspection template that you enter must be in the same region as the data to be profiled.
To respect data residency, Sensitive Data Protection doesn't use an inspection template outside the region where that template is stored.
To find the full resource name of an inspection template, follow these steps:
Go to your inspection templates list. This page opens on a separate tab.
Switch to the project that contains the inspection template that you want to use.
On the Templates tab, click the template ID of the template that you want to use.
On the page that opens, copy the full resource name of the template. The full resource name follows this format:
projects/PROJECT_ID/locations/REGION/inspectTemplates/TEMPLATE_ID
On the Create scan configuration page, in the Template name field, paste the full resource name of the template.
To add an inspection template for another region, click Add inspection template and enter the template's full resource name. Repeat this for each region where you have a dedicated inspection template.
Optional: Add an inspection template that's stored in the
global
region. Sensitive Data Protection automatically uses that template for data in regions where you don't have a dedicated inspection template.
Add actions
In the following sections, you specify actions that you want Sensitive Data Protection to take after it generates the data profiles.
For information about how other Google Cloud services may charge you for configuring actions, see Pricing for exporting data profiles.
Publish to Security Command Center
Findings from data profiles provide context when you triage and develop response plans for your vulnerability and threat findings in Security Command Center.
Before you can use this action, Security Command Center must be activated at the organization level. Turning on Security Command Center at the organization level enables the flow of findings from integrated services like Sensitive Data Protection. Sensitive Data Protection works with Security Command Center in all service tiers.If Security Command Center isn't activated at the organization level, Sensitive Data Protection findings won't appear in Security Command Center. For more information, see Check the activation level of Security Command Center.
To send the results of your data profiles to Security Command Center, make sure the Publish to Security Command Center option is turned on.
For more information, see Publish data profiles to Security Command Center.
Save data profile copies to BigQuery
Turning on Save data profile copies to BigQuery lets you keep a saved copy or history of all of your generated profiles. Doing so can be useful for creating audit reports and visualizing data profiles. You can also load this information into other systems.
Also, this option lets you see all of your data profiles in a single view, regardless of which region your data resides in. If you turn off this option, you can still view the data profiles in the Google Cloud console. However, in the Google Cloud console, you select one region at a time, and see only the data profiles for that region.
To export copies of the data profiles to a BigQuery table, follow these steps:
Turn on Save data profile copies to BigQuery.
Enter the details of the BigQuery table where you want to save the data profile copies:
For Project ID, enter the ID of an existing project where you want data profiles to be exported to.
For Dataset ID, enter the name of an existing dataset in the project where you want data profiles to be exported to.
For Table ID, enter a name for the BigQuery table where data profiles will be exported to. If this table doesn't exist, Sensitive Data Protection automatically creates it for you using the name you provide.
Sensitive Data Protection starts exporting profiles from the time you turn on this option. Profiles that were generated before you turned on exporting aren't saved to BigQuery.
For example queries that you can that you can use when analyzing data profiles, see Analyze data profiles.
Save sample discovery findings to BigQuery
Sensitive Data Protection can add sample findings to a BigQuery table of your choice. Sample findings represent a subset of all findings and might not represent all infoTypes that were discovered. Normally, the system generates around 10 sample findings per bucket, but this number can vary for each discovery run.
Each finding includes the actual string (also called quote) that was detected and its exact location.
This action is useful if you want to evaluate whether your inspection configuration is correctly matching the type of information that you want to flag as sensitive. Using the exported data profiles and the exported sample findings, you can run queries to get more information about the specific items that were flagged, the infoTypes they matched, their exact locations, their calculated sensitivity levels, and other details.
Example query: Show sample findings related to file store data profiles
This example requires both Save data profile copies to BigQuery and Save sample discovery findings to BigQuery to be enabled.
The following query uses an INNER JOIN
operation on both
the table of exported data profiles and the table of exported sample findings. In the resulting
table, each record shows the finding's quote, the infoType that it matched, the resource that
contains the finding, and the calculated sensitivity level of the resource.
SELECT findings_table.quote, findings_table.infotype.name, findings_table.location.container_name, profiles_table.file_store_profile.file_store_path as bucket_name, profiles_table.file_store_profile.sensitivity_score as bucket_sensitivity_score FROM `FINDINGS_TABLE_PROJECT_ID.FINDINGS_TABLE_DATASET_ID.FINDINGS_TABLE_ID_latest_v1` AS findings_table INNER JOIN `PROFILES_TABLE_PROJECT_ID.PROFILES_TABLE_DATASET_ID.PROFILES_TABLE_ID_latest_v1` AS profiles_table ON findings_table.data_profile_resource_name=profiles_table.file_store_profile.name
To save sample findings to a BigQuery table, follow these steps:
Turn on Save sample discovery findings to BigQuery.
Enter the details of the BigQuery table where you want to save the sample findings.
For Project ID, enter the ID of an existing project where you want to export the findings to.
For Dataset ID, enter the name of an existing dataset in the project.
For Table ID, enter the name of the BigQuery table where want to save the findings to. If this table doesn't exist, Sensitive Data Protection automatically creates it for you using the name that you provide.
For information about the contents of each finding that is saved in the
BigQuery table, see
DataProfileFinding
.
Attach tags to resources
Turning on Attach tags to resources instructs Sensitive Data Protection to automatically tag your data according to its calculated sensitivity level. This section requires you to first complete the tasks in Control IAM access to resources based on data sensitivity.
To automatically tag a resource according to its calculated sensitivity level, follow these steps:
- Turn on the Tag resources option.
For each sensitivity level (high, moderate, low, and unknown), enter the path of the tag value that you created for the given sensitivity level.
If you skip a sensitivity level, no tag is attached for it.
To automatically lower the data risk level of a resource when the sensitivity level tag is present, select When a tag is applied to a resource, lower the data risk of its profile to LOW. This option helps you measure the improvement in your data security and privacy posture.
Select one or both of the following options:
- Tag a resource when it is profiled for the first time.
Tag a resource when its profile is updated. Select this option if you want Sensitive Data Protection to overwrite the sensitivity level tag value on succeeding discovery runs. Consequently, a principal's access to a resource changes automatically as the calculated data sensitivity level for that resource increases or decreases.
Don't select this option if you plan to manually update the sensitivity level tag values that the discovery service attached to your resources. If you select this option, Sensitive Data Protection can overwrite your manual updates.
Publish to Pub/Sub
Turning on Publish to Pub/Sub lets you take programmatic actions based on profiling results. You can use Pub/Sub notifications to develop a workflow for catching and remediating findings with significant data risk or sensitivity.
To send notifications to a Pub/Sub topic, follow these steps:
Turn on Publish to Pub/Sub.
A list of options appears. Each option describes an event that causes Sensitive Data Protection to send a notification to Pub/Sub.
Select the events that should trigger a Pub/Sub notification.
If you select Send a Pub/Sub notification each time a profile is updated, Sensitive Data Protection sends a notification when there's a change in the sensitivity level, data risk level, detected infoTypes, public access, and other important metrics in the profile.
For each event you select, follow these steps:
Enter the name of the topic. The name must be in the following format:
projects/PROJECT_ID/topics/TOPIC_ID
Replace the following:
- PROJECT_ID: the ID of the project associated with the Pub/Sub topic.
- TOPIC_ID: the ID of the Pub/Sub topic.
Specify whether to include the full bucket profile in the notification, or just the full resource name of the bucket that was profiled.
Set the minimum data risk and sensitivity levels that must be met for Sensitive Data Protection to send a notification.
Specify whether only one or both of the data risk and sensitivity conditions must be met. For example, if you choose
AND
, then both the data risk and the sensitivity conditions must be met before Sensitive Data Protection sends a notification.
Set location to store configuration
Click the Resource location list, and select the region where you want to store this scan configuration. All scan configurations that you later create will also be stored in this location.
Where you choose to store your scan configuration doesn't affect the data to be scanned. Your data is scanned in the same region where that data is stored. For more information, see Data residency considerations.
Review and create
- If you want to make sure that profiling doesn't start automatically after you
create the scan configuration, select Create scan in paused mode.
This option is useful in the following cases:
- You opted to save data profiles to BigQuery and you want to make sure the service agent has write access to the BigQuery table where the data profile copies will be saved.
- You opted to save sample discovery findings to BigQuery and you want to make sure that the service agent has write access to the BigQuery table where the sample findings will be saved.
- You configured Pub/Sub notifications and you want to grant publishing access to the service agent.
- You enabled the Attach tags to resources action and you need to grant the service agent access to the sensitivity level tag.
- Review your settings and click Create.
Sensitive Data Protection creates the scan configuration and adds it to the discovery scan configurations list.
To view or manage your scan configurations, see Manage scan configurations.