Sensitive data discovery for Amazon S3

This page describes Sensitive Data Protection discovery for use with Amazon S3. This feature is available only to customers who have activated Security Command Center at the Enterprise tier. If you are interested in enabling this feature, send an email to cloud-dlp-feedback@google.com.

Sensitive Data Protection discovery helps you learn about the types of data that you're storing in S3 and the sensitivity levels of your data. When you profile your S3 data, you generate file store data profiles, which provide insights and metadata about your S3 buckets. For each S3 bucket, a file store data profile includes the following information:

  • The types of files that you're storing in the bucket, categorized into file clusters
  • The sensitivity level of the data in the bucket
  • A summary about each detected file cluster, including the types of sensitive information found

For a full list of insights and metadata in each file store data profile, see File store data profiles.

For more information about the discovery service, see Data profiles.

Workflow

The high-level workflow for profiling Amazon S3 data is as follows:

  1. In Security Command Center, create a connector for Amazon Web Services (AWS). Make sure that you select the Grant permissions for Sensitive Data Protection discovery checkbox and follow the instructions to configure the connector with sensitive data discovery permissions.

    If you already have an existing AWS connector, edit the connector to select the Grant permissions for Sensitive Data Protection discovery checkbox. Download the updated CloudFormation templates and upload them to your AWS environment. If you manually configured your AWS accounts, follow the instructions to configure the IAM policy for the delegated role, the Sensitive Data Protection collector policy, and the Sensitive Data Protection collector role.

  2. Create an inspection template in the global region or the region where you plan to store the discovery scan configuration and all generated data profiles.

  3. Create a discovery scan configuration for Amazon S3.

    Sensitive Data Protection profiles your data according to the schedule that you specify.

Pricing

When you profile Amazon S3 data, you incur the Sensitive Data Protection charges that are listed in Discovery pricing. In addition, AWS charges you for requests that Sensitive Data Protection makes and for data transfers from S3 to the internet.

Requests from Sensitive Data Protection

Sensitive Data Protection performs the following operations in the process of profiling your S3 buckets:

  • Around 50 LIST requests per day per profiled S3 bucket.
  • Around 10 GET requests per file in a profiled bucket. Sensitive Data Protection generally makes under 100,000 GET calls. Don't rely on this value when optimizing for cost; this value might change in the future.

The price that AWS charges per 1,000 requests differs based on the region of the S3 bucket. For more information, see Requests & data retrievals in the Amazon S3 pricing documentation.

Data transfers from S3 to the internet

When Sensitive Data Protection profiles S3 data, the data is considered to be transferred from S3 to the internet. AWS charges may apply. For more information, see Data Transfer OUT From Amazon S3 To Internet in the Amazon S3 pricing documentation.

Data residency considerations

Consider the following when you plan to profile Amazon S3 data:

  • The data profiles are stored alongside the discovery scan configuration. In contrast, when you profile Google Cloud data, the profiles are stored in the same region as the data to be profiled.

  • If you store your inspection template in the global region, an in-memory copy of that template is read in the region where you store the discovery scan configuration.

  • Your S3 data is not modified. An in-memory copy of your data is read in the region where you store the discovery scan configuration. However, Sensitive Data Protection makes no guarantees about where the data passes through after it reaches the public internet. The data is encrypted with SSL.

What's next