Profile Amazon S3 data

This page describes how to configure Sensitive Data Protection discovery for Amazon S3.

For more information about the discovery service, see Data profiles.

This feature is available only to customers who have activated Security Command Center at the Enterprise tier. If you are interested in enabling this feature, send an email to cloud-dlp-feedback@google.com.

Before you begin

  1. In Security Command Center, create a connector for Amazon Web Services (AWS). Don't clear the Grant permissions for Sensitive Data Protection discovery checkbox. Sensitive Data Protection needs those permissions to profile your Amazon S3 data.

    If you already have a connector that doesn't have Grant permissions for Sensitive Data Protection discovery selected, edit the connector to select the checkbox. Download the updated CloudFormation templates and upload them to your AWS environment. For more information, see Connect to AWS for vulnerability detection and risk assessment.

  2. Confirm that you have the IAM permissions that are required to configure data profiles at the organization level.

    If you don't have the Organization Administrator (roles/resourcemanager.organizationAdmin) or Security Admin (roles/iam.securityAdmin) role, you can still create a scan configuration. However, after you create the scan configuration, someone with either of those roles must grant data profiling access to your service agent.

  3. Confirm that you have an inspection template in the global region or the region where you plan to store the discovery scan configuration and all generated data profiles.

    This task lets you automatically create an inspection template in the global region only. If organizational policies prevent you from creating an inspection template in the global region, then before you perform this task, you must create an inspection template in the region where you plan to store the discovery scan configuration.

  4. To send Pub/Sub notifications to a topic when certain events occur—such as when Sensitive Data Protection profiles a new bucket—create a Pub/Sub topic.

To generate data profiles, you need a service agent container and a service agent within it. This task lets you create them automatically.

Create a scan configuration

  1. Go to the Create scan configuration page.

    Go to Create scan configuration

  2. Go to your organization. On the toolbar, click the project selector and select your organization.

The following sections provide more information about the steps in the Create scan configuration page. At the end of each section, click Continue.

Select a discovery type

Select Amazon S3.

Select scope

Do one of the following:

  • To scan all S3 data that your AWS connector has access to, select Scan all available organizations.
  • To scan the S3 data in a single AWS account, select Scan selected account. Then, enter the AWS account ID.
  • To scan a single S3 bucket, select Scan bucket (test mode). Enter the ID of the AWS account that contains the bucket, and enter the bucket name.

Manage schedules

If the default profiling frequency suits your needs, you can skip this section of the Create scan configuration page.

Configure this section for the following reasons:

  • To make fine-grained adjustments to the profiling frequency of all your data or certain subsets of your data.
  • To specify the buckets that you don't want to profile.
  • To specify the buckets that you don't want profiled more than once.

To make fine-grained adjustments to profiling frequency, follow these steps:

  1. Click Add schedule.

  2. In the Filters section, define one or more filters that specify which buckets are in the schedule's scope.

    Specify at least one of the following:

    • An account ID or a regular expression that specifies one or more account IDs
    • A bucket name or a regular expression that specifies one or more buckets

    Regular expressions must follow the RE2 syntax.

    For example, if you want all buckets in an account to be included in the filter, enter the account ID in the Account ID field.

    To add more filters, click Add filter and repeat this step.

  3. Click Frequency.

  4. In the Frequency section, specify whether to profile the buckets that you selected, and if so, how often:

    • If you never want the buckets to be profiled, turn off Do profile this data.

    • If you want the buckets to be profiled at least once, leave Do profile this data on.

      Specify whether to reprofile your data and what events should trigger a reprofile operation. For more information, see Frequency of data profile generation.

      1. For On a schedule, specify how often you want the the buckets to be reprofiled. The buckets are reprofiled regardless of whether they underwent any changes.
      2. For When inspect template changes, specify whether you want your data to be reprofiled when the associated inspection template is updated, and if so, how often.

        An inspection template change is detected when either of the following occurs:

        • The name of an inspection template changes in your scan configuration.
        • The updateTime of an inspection template changes.

  5. Optional: Click Conditions.

    In the Conditions section, specify any conditions that the buckets—defined in your filters—must meet before Sensitive Data Protection profiles them.

    If needed, set the following:

    • Minimum conditions: If you want to delay profiling a bucket until it reaches a certain age, turn on this option. Then, enter the minimum duration.

    • Object storage class conditions: By default, Sensitive Data Protection scans all objects in a bucket. If you want to scan only objects that have specific attributes, select those attributes.

    Example conditions

    Suppose that you have the following configuration:

    • Minimum conditions

      • Minimum duration: 24 hours
    • Object storage class conditions

      • Scan objects with the S3 Standard object storage class
      • Scan objects with the S3 Glacier Instant Retrieval storage class

    In this case, Sensitive Data Protection considers only the buckets that are at least 24 hours old. Within those buckets, Sensitive Data Protection profiles only the objects that are in the Amazon S3 Standard or Amazon S3 Glacier Instant Retrieval storage class.

  6. Click Done.

  7. If you want to add more schedules, click Add schedule and repeat the previous steps.

  8. To reorder the schedules according to priority, use the up and down arrows. For example, if the filters in two different schedules match the same bucket, the schedule higher on the priority list takes precedence.

    The last schedule in the list is always the one labeled Default schedule. This default schedule covers the buckets in your selected resource (organization or folder) that don't match any of the schedules that you created. This default schedule follows the system default profiling frequency.

  9. If you want to adjust the default schedule, click Edit schedule, and adjust the settings as needed.

Select an inspection template

Depending on how you want to provide an inspection configuration, choose one of the following options. Regardless of which option you choose, Sensitive Data Protection scans your data in the region where that data is stored. That is, your data doesn't leave its region of origin.

Option 1: Create an inspection template

Choose this option if you want to create a new inspection template in the global region.

  1. Click Create new inspection template.
  2. Optional: To modify the default selection of infoTypes, click Manage infoTypes.

    For more information about how to manage built-in and custom infoTypes, see Manage infoTypes through the Google Cloud console.

    You must have at least one infoType selected to continue.

  3. Optional: Configure the inspection template further by adding rulesets and setting a confidence threshold. For more information, see Configure detection.

    When Sensitive Data Protection creates the scan configuration, it stores this new inspection template in the global region.

Option 2: Use an existing inspection template

Choose this option if you have existing inspection templates that you want to use.

  1. Click Select existing inspection template.

  2. Enter the full resource name of the inspection template that you want to use. The Region field is automatically populated with the name of the region where your inspection template is stored.

    The inspection template that you enter must be in the same region where you plan to store this discovery scan configuration and all the generated data profiles.

    To respect data residency, Sensitive Data Protection doesn't use an inspection template outside the region where that template is stored.

    To find the full resource name of an inspection template, follow these steps:

    1. Go to your inspection templates list. This page opens on a separate tab.

      Go to inspection templates

    2. Switch to the project that contains the inspection template that you want to use.

    3. On the Templates tab, click the template ID of the template that you want to use.

    4. On the page that opens, copy the full resource name of the template. The full resource name follows this format:

      projects/PROJECT_ID/locations/REGION/inspectTemplates/TEMPLATE_ID
    5. On the Create scan configuration page, in the Template name field, paste the full resource name of the template.

    Add actions

    In the following sections, you specify actions that you want Sensitive Data Protection to take after it generates the data profiles.

    For information about how other Google Cloud services may charge you for configuring actions, see Pricing for exporting data profiles.

    Publish to Google Security Operations

    Metrics gathered from data profiles can add context to your security operations. The added context can help you determine the most important security issues to address.

    For example, if you're investigating a particular service agent, you can determine what resources the service agent accessed and whether any of those resources had high-sensitivity data.

    To send your data profiles to the Google Security Operations component of Security Command Center Enterprise, turn on Publish to Chronicle.

    Publish to Security Command Center

    Findings from data profiles provide context when you triage and develop response plans for your vulnerability and threat findings in Security Command Center.

    To send the results of your data profiles to Security Command Center, make sure the Publish to Security Command Center option is turned on.

    For more information, see Publish data profiles to Security Command Center.

    Save data profile copies to BigQuery

    Turning on Save data profile copies to BigQuery lets you keep a saved copy or history of all of your generated profiles. Doing so can be useful for creating audit reports and visualizing data profiles. You can also load this information into other systems.

    Also, this option lets you see all of your data profiles in a single view, regardless of which region your data resides in. If you turn off this option, you can still view the data profiles in the Google Cloud console. However, in the Google Cloud console, you select one region at a time, and see only the data profiles for that region.

    To export copies of the data profiles to a BigQuery table, follow these steps:

    1. Turn on Save data profile copies to BigQuery.

    2. Enter the details of the BigQuery table where you want to save the data profiles:

      • For Project ID, enter the ID of an existing project where you want data profiles to be exported to.

      • For Dataset ID, enter the name of an existing dataset in the project where you want data profiles to be exported to.

      • For Table ID, enter a name for the BigQuery table where data profiles will be exported to. If you haven't created this table, Sensitive Data Protection automatically creates it for you using the name you provide.

    Sensitive Data Protection starts exporting profiles from the time you turn on this option. Profiles that were generated before you turned on exporting aren't saved to BigQuery.

    Publish to Pub/Sub

    Turning on Publish to Pub/Sub lets you take programmatic actions based on profiling results. You can use Pub/Sub notifications to develop a workflow for catching and remediating findings with significant data risk or sensitivity.

    To send notifications to a Pub/Sub topic, follow these steps:

    1. Turn on Publish to Pub/Sub.

      A list of options appears. Each option describes an event that causes Sensitive Data Protection to send a notification to Pub/Sub.

    2. Select the events that should trigger a Pub/Sub notification.

    3. If you select Send a Pub/Sub notification each time a profile is updated, Sensitive Data Protection sends a notification when there's a change in the following file store metrics:

      • Data risk
      • Sensitivity
      • File extensions scanned
      • File extensions seen
      • InfoTypes
      • Public
    4. For each event you select, follow these steps:

      1. Enter the name of the topic. The name must be in the following format:

        projects/PROJECT_ID/topics/TOPIC_ID
        

        Replace the following:

        • PROJECT_ID: the ID of the project associated with the Pub/Sub topic.
        • TOPIC_ID: the ID of the Pub/Sub topic.
      2. Specify whether to include the full bucket profile in the notification, or just the full resource name of the bucket that was profiled.

      3. Set the minimum data risk and sensitivity levels that must be met for Sensitive Data Protection to send a notification.

      4. Specify whether only one or both of the data risk and sensitivity conditions must be met. For example, if you choose AND, then both the data risk and the sensitivity conditions must be met before Sensitive Data Protection sends a notification.

    Manage service agent container and billing

    In this section, you specify the project to use as a service agent container. You can have Sensitive Data Protection automatically create a new project, or you can choose an existing project.

    Regardless of whether you're using a newly created service agent or reusing an existing one, make sure it has read access to the data to be profiled.

    Automatically create a project

    If you don't have the permissions needed to create a project in the organization, you need to select an existing project instead or obtain the required permissions. For information about the required permissions, see Roles required to work with data profiles at the organization or folder level.

    To automatically create a project to use as your service agent container, follow these steps:

    1. In the Service agent container field, review the suggested project ID and edit it as needed.
    2. Click Create.
    3. Optional: Update the default project name.
    4. Select the account to bill for all billable operations related to this new project, including operations that aren't related to discovery.

    5. Click Create.

    Sensitive Data Protection creates the new project. The service agent within this project will be used to authenticate to Sensitive Data Protection and other APIs.

    Select an existing project

    To select an existing project as your service agent container, click the Service agent container field and select the project.

    Set the location to store the configuration

    Click the Resource location list, and select the region where you want to store this scan configuration. All scan configurations that you later create will also be stored in this location.

    Where you choose to store your scan configuration doesn't affect the data to be scanned. Your data is scanned in the same region where that data is stored. For more information, see Data residency considerations.

    Review and create the configuration

    1. If you want to make sure that profiling doesn't start automatically after you create the scan configuration, select Create scan in paused mode.

      This option is useful in the following cases:

      • Your Google Cloud administrator still needs to grant data profiling access to the service agent.
      • You want to create multiple scan configurations and you want some configurations to override others.
      • You opted to save data profiles to BigQuery, and you want to make sure the service agent has write access to your output table.
      • You configured Pub/Sub notifications and you want to grant publishing access to the service agent.
    2. Review your settings and click Create.

      Sensitive Data Protection creates the scan configuration and adds it to the discovery scan configurations list.

    To view or manage your scan configurations, see Manage scan configurations.

    What's next