Share sensitive data with data clean rooms

Data clean rooms provide a security-enhanced environment in which multiple parties can share, join, and analyze their data assets without moving or revealing the underlying data.

BigQuery data clean rooms are built on the Analytics Hub platform. While standard Analytics Hub data exchanges provide a way to share data across organizational boundaries at scale, data clean rooms help you address sensitive and protected data-sharing use cases. Data clean rooms provide additional security controls to help protect the underlying data and enforce analysis rules that the data owner defines.

The following are primary use cases:

  • Campaign planning and audience insights. Let two parties (such as sellers and buyers) mix first-party data and improve data enrichment in a privacy-centric way.
  • Measurement and attribution. Match customer and media performance data to better understand the effectiveness of marketing efforts and make more informed business decisions.
  • Activation. Combine customer data with data from other parties to enrich understanding of customers, enabling improved segmentation capabilities and more effective media activation.

There are also several data clean room use cases beyond the marketing industry:

  • Retail and consumer packaged goods (CPG). Optimize marketing and promotional activities by combining point-of-sale data from retailers and marketing data from CPG companies.
  • Financial services. Improve fraud detection by combining sensitive data from other financial and government agencies. Build credit risk scoring by aggregating customer data across multiple banks.
  • Healthcare. Share data between doctors and pharmaceutical researchers to learn how patients are reacting to treatments.
  • Supply chain, logistics, and transportation. Combine data from suppliers and marketers to get a complete picture of how products perform throughout their lifecycle.

Roles

There are three main roles in BigQuery data clean rooms:

  • Data clean room owner: a user that manages permissions, visibility, and membership of one or more data clean rooms within a project. This role is analogous to the Analytics Hub Admin.
  • Data contributor: a user that is assigned by the data clean room owner to publish data to a data clean room. In many cases, a data clean room owner is also a data contributor. This role is analogous to the Analytics Hub Publisher.
  • Subscriber: a user that is assigned by the data clean room owner to subscribe to the data published in a data clean room, letting them run queries on the data. This role is analogous to a combination of the Analytics Hub Subscriber and Analytics Hub Subscription Owner. Subscribers must have non-edition offerings or the Enterprise Plus edition.

Architecture

BigQuery data clean rooms are built on a publish and subscribe model of BigQuery data. BigQuery architecture provides a separation between compute and storage, enabling data contributors to share data without having to make multiple copies of the data. The following image is an overview of the BigQuery data clean room architecture:

Data contributors publish data to the data clean room, which subscribers can query with privacy filters.

Data clean room

A data clean room is an environment to share sensitive data where raw access is prevented and query restrictions are enforced. Only users or groups that are added as subscribers to a data clean room can subscribe to the shared data. Data clean room owners can create as many data clean rooms as they want in Analytics Hub.

Shared resources

A shared resource is the unit of data sharing in a data clean room. The resource must be a BigQuery table or view. As a data contributor, you create or use an existing BigQuery resource in your project that you want to share with your subscribers.

Listings

A listing is created when a data contributor adds data into a data clean room. It contains a reference to the data contributor's shared resource along with descriptive information that helps subscribers use the data. As a data contributor, you can create a listing and include information such as a description, sample queries, and links to documentation for your subscribers.

Linked datasets

A linked dataset is a read-only BigQuery dataset that serves as a symbolic link to all data in a data clean room. When subscribers query resources in a linked dataset, data from the shared resources is returned, satisfying analysis rules set by the data contributor. As a subscriber, a linked dataset is created inside your project when you subscribe to a data clean room. No copy of the data is created, and subscribers can't see certain metadata, such as view definitions.

Analysis rules

As a data contributor, you configure analysis rules on the resources that you share in the data clean room. Analysis rules prevent raw access to underlying data and enforce query restrictions. For example, data clean rooms support the aggregation threshold analysis rule, which lets subscribers analyze data only through aggregation queries.

Data egress controls

Data egress controls are automatically enabled to help prevent subscribers from copying and exporting raw data from a data clean room. Data contributors can configure additional controls to help prevent the copy and export of query results that are obtained by the subscribers.

Limitations

BigQuery data clean rooms have the following limitations:

  • You can set analysis rules only on views, not on tables or materialized views. Due to this limitation, if a data contributor directly shares tables or materialized views (or views without analysis rules) into a data clean room, then subscribers have raw access to the data in those resources.
  • As data clean rooms are built on the Analytics Hub platform, all Analytics Hub limitations apply.
  • Data clean rooms are only available in Analytics Hub regions.
  • As a subscriber, you can't search for shared resources in Dataplex or Data Catalog.
  • As a subscriber, you can't query INFORMATION_SCHEMA views on linked datasets.
  • As a data contributor, you can't publish an entire dataset directly to a data clean room.
  • As a data contributor, you can't publish models or routines to a data clean room.
  • You can add a maximum of 100 shared resources to a data clean room. If you need to increase this limit, contact bq-dcr-feedback@google.com.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, enable the Analytics Hub API, and assign the Analytics Hub Admin role to your data clean room owner (the user who will create the data clean room).

Required permissions

To get the permissions that you need to use data clean rooms, ask your administrator to grant you the BigQuery Data Editor (roles/bigquery.dataEditor) IAM role. For more information about granting roles, see Manage access.

This predefined role contains the permissions required to use data clean rooms. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use data clean rooms:

  • serviceUsage.services.get
  • serviceUsage.services.list
  • serviceUsage.services.enable

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.

Enable the Analytics Hub API

To enable the Analytics Hub API, select one of the following options:

Console

Open the Analytics Hub API page for your Google Cloud project and enable it.

Enable the Analytics Hub API

bq

Run the gcloud services enable command:

gcloud services enable analyticshub.googleapis.com

Once you enable the Analytics Hub API, you can access the Analytics Hub page.

Assign the Analytics Hub Admin role

Your data clean room owner must have the Analytics Hub Admin role (roles/analyticshub.admin). To learn how to grant this role to other users, see Create Analytics Hub administrators.

Data clean room owner workflows

As a data clean room owner, you can do the following:

  • Create a data clean room.
  • Update data clean room properties.
  • Delete a data clean room.
  • Manage data contributors.
  • Manage subscribers.
  • Share a data clean room.

Additional data clean room owner permissions

You must have the Analytics Hub Admin role (roles/analyticshub.admin) on your project to perform data clean room owner tasks.

Create a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click Create clean room.

  3. For Project, select the project for the data clean room. The Analytics Hub API must be enabled for the project.

  4. Specify the location, name, primary contact, icon (optional), and description for the data clean room. Only resources that are in the same region as the data clean room can be listed in the data clean room.

  5. Click Create clean room.

  6. Optional: In the Clean Room Permissions section, add other data clean room owners, data contributors, or subscribers.

    Create data clean room pane.

Update a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to update.

  3. In the Details tab, click Edit clean room details.

  4. Update the data clean room name, primary contact, icon, or description as needed.

  5. Click Save.

Delete a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to delete, click More actions > Delete.

  3. To confirm, enter delete, and then click Delete. You can't undo this action.

When you delete a data clean room, all the listings within it are deleted. However, the shared resources and linked datasets are not deleted. The linked datasets are unlinked from the source datasets, so querying resources in the data clean room starts to fail for subscribers.

Manage data contributors

As a data clean room owner, you manage which users can add data to your data clean rooms (your data contributors). To let a user add data to a data clean room, grant them the Analytics Hub Publisher role (roles/analyticshub.publisher) on a specific data clean room:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to grant permissions to.

  3. In the Details tab, click Set permissions.

  4. Click Add principal.

  5. For New principals, enter the usernames or emails of the data contributors that you're adding.

  6. For Select a role, select Analytics Hub > Analytics Hub Publisher.

  7. Click Save.

You can delete and update data contributors at any time by clicking Set Permissions.

You can grant the Analytics Hub Publisher role for an entire project from the IAM page, which gives a user permission to add data to any data clean room in a project. However, we don't recommend this action, as it might result in users having overly permissive access.

Manage subscribers

As a data clean room owner, you manage which users can subscribe to your data clean rooms (your subscribers). To allow a user to subscribe to a data clean room, grant them the Analytics Hub Subscriber (roles/analyticshub.subscriber) and Analytics Hub Subscription Owner (roles/analyticshub.subscriptionOwner) roles on a specific data clean room:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to grant permissions to.

  3. In the Details tab, click Set permissions.

  4. Click Add principal.

  5. For New principals, enter the usernames or emails of the subscribers that you're adding.

  6. For Select a role, select Analytics Hub > Analytics Hub Subscriber.

  7. Click Add another role.

  8. For Select a role, select Analytics Hub > Analytics Hub Subscription Owner.

  9. Click Save.

You can delete and update subscribers at any time by clicking Set Permissions.

You can grant the Analytics Hub Subscriber and Analytics Hub Subscription Owner roles for an entire project from the IAM page, which gives a user permission to subscribe to any data clean room in a project. However, we don't recommend this action, as it might result in users having overly permissive access.

Share a data clean room

You can directly share a data clean room with subscribers:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to share, click More actions > Copy share link.

  3. Share the copied link with subscribers to let them view and subscribe to the data clean room.

Data contributor workflows

As a data contributor, you can do the following:

  • Add data to a data clean room by creating a listing.
  • Update a listing.
  • Delete a listing.
  • Share a data clean room.
  • Monitor listings.

Additional data contributor permissions

To perform data contributor tasks, you must have the Analytics Hub Publisher role (roles/analyticshub.publisher) on a data clean room.

In addition, you need the bigquery.datasets.link permission for the datasets that contain the resources that you want to list in a data clean room. You also need the resourcemanager.organization.get permission if you want to view data clean rooms in your organization that are not in your current project.

Create a listing (add data)

To prepare data with analysis rules and publish to a data clean room as a listing, do the following:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to create a listing in.

    If you're in a different organization than your data clean room owner and the data clean room is not visible to you, ask the data clean room owner for a direct link.

  3. Click Add data.

  4. For Select dataset and Table/view name, enter the table or view that you want to list in the data clean room and its corresponding dataset. You will add analysis rules to prevent raw access to this underlying data in a few steps.

  5. Select the columns of your resource that you want to publish.

  6. Set the view name, primary contact, and description (optional) for the listing.

  7. Click Next.

  8. Choose an analysis rule for your listing and configure the details.

  9. Set data egress controls for the listing.

  10. Click Next.

  11. Review the data and analysis rule that you're adding to the data clean room.

  12. Click Add data. A view is created for your data and is added as a listing to the data clean room. The source table or view itself isn't added.

By listing a resource in a data clean room, you grant all current and future data clean room subscribers access to the data in your shared resource.

If you try to create a listing with a shared resource that doesn't have an analysis rule, you're shown a warning that subscribers will be able to access the raw data for that resource. If you confirm that you're willingly publishing such resources without analysis rules, you can still create the listing.

If you get the Failed to save listing error, ensure that you have the necessary permissions to perform data contributor tasks.

Update a listing

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that contains the listing.

  3. In the row of the listing that you want to update, click More actions > Edit listing.

  4. Update the primary contact or description as needed.

  5. Click Next.

  6. Update the analysis rule as needed. You can only update the parameters of the chosen rule. You can't switch to a different rule.

  7. Click Next.

  8. Review the listing and click Add data.

You can't change the source resource or data egress controls for a listing after it's created.

Delete a listing

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that contains the listing.

  3. In the row of the listing that you want to delete, click More actions > Delete listings.

  4. To confirm, enter delete, and then click Delete. You cannot undo this action.

When you delete a listing, the shared resources and linked datasets are not deleted. The linked datasets are unlinked from the source datasets, so querying data in that listing starts to fail for subscribers.

Share a data clean room

You can directly share a data clean room with subscribers:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to share, click More actions > Copy share link.

  3. Share the copied link with subscribers to let them view and subscribe to the data clean room.

Monitor listings

You can view the usage metrics on the source datasets of the resources that you share in a data clean room by querying the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

To view your listing subscribers, do the following:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room.

  3. In the row of a listing that you want to view, click More actions > View subscriptions.

Subscriber workflows

A subscriber can view and subscribe to a data clean room. Subscribing to a data clean room creates one linked dataset in the subscriber's project. Each linked dataset has the same name as the data clean room.

You can't subscribe to a specific listing within a data clean room. You can only subscribe to the data clean room itself.

Additional subscriber permissions

You must have the Analytics Hub Subscriber (roles/analyticshub.subscriber) and Analytics Hub Subscription Owner (roles/analyticshub.subscriptionOwner) roles on a data clean room to perform subscriber tasks.

In addition, you need the bigquery.datasets.create permission in a project to create a linked dataset when you subscribe to a clean room.

Subscribe to a data clean room

Subscribing to a data clean room gives you query access to the data in the listings by creating a linked dataset in your project. To subscribe to a data clean room, do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click Add.

  3. Select Analytics Hub. A discovery page opens.

  4. To display the data clean rooms that you have access to, in the filters list, select Clean rooms.

  5. Click the data clean room that you want to subscribe to. A description page of the data clean room opens.

  6. Click Subscribe.

  7. Select the destination project for the subscription and click Subscribe.

A linked dataset is now added to the project that you specified and is available for query.

As a subscriber, you can edit some metadata of your linked datasets, such as description and labels. You can also set permissions on your linked datasets. However, changes to linked datasets don't affect the source datasets. You also can't see view definitions.

Resources that are contained in linked datasets are read-only. As a subscriber, you can't edit data or metadata for resources in linked datasets. You also can't specify permissions for individual resources within the linked dataset.

To unsubscribe to the data clean room, delete your linked dataset.

Query data in a linked dataset

To query data in a linked dataset, use the SELECT WITH AGGREGATION_THRESHOLD syntax, which lets you run queries on analysis rule-enforced views. For an example of this syntax, see Query an aggregation threshold analysis rule–enforced view.

Example scenario: Advertiser and publisher attribution analysis

An advertiser wants to track the effectiveness of its marketing campaigns. The advertiser has first-party data on its customers, including their purchase history, demographics, and interests. The publisher has data from its website, including which ads were shown to visitors and their conversions.

The advertiser and publisher agree to use a data clean room to combine data and measure the results of their campaigns. In this case, the publisher creates the data clean room and makes their data available for the advertiser to perform the analysis. The result is an attribution report that shows the advertiser which ads were most effective in driving sales. The advertiser can then use this information to improve its future marketing campaigns.

The advertiser and publisher orchestrate the BigQuery data clean room through the following process:

Create the data clean room (publisher)

  1. A data clean room owner in the publisher organization enables the Analytics Hub API in their BigQuery project and assigns User A as the data clean room owner (Analytics Hub Admin).
  2. User A creates a data clean room called Campaign Analysis and assigns the following permissions:
    • Data contributor (Analytics Hub Publisher): User B, a data engineer in the publisher organization.
    • Subscriber (Analytics Hub Subscriber and Subscription Owner): User C, a marketing analyst in the advertiser organization.

Add data to the data clean room (publisher)

  1. User B creates a new listing in the data clean room called Publisher Conversion Data. As part of listing creation, a new view with analysis rules is created.

Subscribe to the data clean room (advertiser)

  1. User C subscribes to the data clean room, which creates a linked dataset for all listings in the data clean room, including the Publisher Conversion Data listing.
  2. User C can now run aggregation queries to combine the data from this linked dataset with their first-party data to measure the campaign effectiveness.

Entity resolution

Data clean room use cases often require linking entities across data contributor and subscriber datasets that don't include a common identifier. Subscribers and data contributors might represent the same records differently in multiple datasets, either because datasets originate from different data sources or because datasets use identifiers from different namespaces.

As a part of data preparation, entity resolution in BigQuery does the following:

  • For data contributors, it deduplicates and resolves records in their shared resources by using identifiers from a common provider of their choice. This process enables cross-contributor joins.
  • For subscribers, it deduplicates and resolves records in their first-party datasets and links to entities in data contributor datasets. This process enables joins between subscriber and data contributor data.

To set up entity resolution with the identity provider of your choice, see Configure and use entity resolution in BigQuery.

Discover data clean room assets

To find all the data clean rooms that you have access to, do the following:

  • For data clean room owners and data contributors, in the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

    All the data clean rooms that you can access are listed.

  • For subscribers, do the following:

    1. In the Google Cloud console, go to the BigQuery page.

      Go to BigQuery

    2. In the Explorer pane, click Add.

    3. Select Analytics Hub. A discovery page opens.

    4. To display the data clean rooms that you have access to, in the filters list, select Clean rooms.

To find all the linked datasets created by data clean rooms in your project, run the following command in a command-line environment:

PROJECT=PROJECT_ID \
for dataset in $(bq ls --project_id $PROJECT | tail +3); \
do [ "$(bq show -d --project_id $PROJECT $dataset | egrep LINKED)" ] \
&& echo $dataset; done

Replace PROJECT_ID with the project that contains your linked datasets.

Pricing

Data contributors are only charged for data storage. Subscribers are only charged for compute (analysis) when they run queries.

Subscribers must have non-edition offerings or the Enterprise Plus edition.

What's next