Share sensitive data with data clean rooms

Data clean rooms provide a security-enhanced environment in which multiple parties can share, join, and analyze their data assets without moving or revealing the underlying data.

BigQuery data clean rooms are built on the Analytics Hub platform. While standard Analytics Hub data exchanges provide a way to share data across organizational boundaries at scale, data clean rooms help you address sensitive and protected data-sharing use cases. Data clean rooms provide additional security controls to help protect the underlying data and enforce privacy policies that the data owner defines.

The following are primary use cases:

  • Campaign planning and audience insights. Let two parties (such as sellers and buyers) mix first-party data and improve data enrichment in a privacy-centric way.
  • Measurement and attribution. Match customer and media performance data to better understand the effectiveness of marketing efforts and make more informed business decisions.
  • Activation. Combine customer data with data from other parties to enrich understanding of customers, enabling improved segmentation capabilities and more effective media activation.

There are also several data clean room use cases beyond the marketing industry:

  • Retail and consumer packaged goods (CPG). Optimize marketing and promotional activities by combining point-of-sale data from retailers and marketing data from CPG companies.
  • Financial services. Improve fraud detection by combining sensitive data from other financial and government agencies. Build credit risk scoring by aggregating customer data across multiple banks.
  • Healthcare. Share data between doctors and pharmaceutical researchers to learn how patients are reacting to treatments.
  • Supply chain, logistics, and transportation. Combine data from suppliers and marketers to get a complete picture of how products perform throughout their lifecycle.

Roles

There are three main roles in BigQuery data clean rooms:

  • Data clean room owner: a user that manages permissions, visibility, and membership of one or more data clean rooms within a project. This role is analogous to the Analytics Hub Admin.
  • Data contributor: a user that is assigned by the data clean room owner to publish data to a data clean room. In many cases, a data clean room owner is also a data contributor. This role is analogous to the Analytics Hub Publisher.
  • Subscriber: a user that is assigned by the data clean room owner to subscribe to the data published in a data clean room, letting them run queries on the data. This role is analogous to a combination of the Analytics Hub Subscriber and Analytics Hub Subscription Owner. Subscribers must have non-edition offerings or Enterprise Plus edition.

Architecture

BigQuery data clean rooms are built on a publish and subscribe model of BigQuery datasets. BigQuery architecture provides a separation between compute and storage, enabling data contributors to share data without having to make multiple copies of the data. The following image is an overview of the BigQuery data clean room architecture:

Data contributors publish data to the data clean room, which subscribers can query with privacy filters.

Data clean room

A data clean room is an environment to share sensitive data with privacy policies. Only users or groups that are added as subscribers to a data clean room can subscribe to the shared data. Data clean room owners can create as many data clean rooms as they want in the Analytics Hub.

Shared datasets

A shared dataset is the unit of data sharing in a data clean room. As a data contributor, you create or use an existing BigQuery dataset in your project with the collection of resources, such as views with privacy policies, that you want to share with your subscribers.

Listings

A listing is created when a data contributor adds data into a data clean room. It contains a reference to the data contributor's shared dataset along with descriptive information that helps subscribers use the data. As a data contributor, you can create a listing and include information such as a dataset description, sample queries, and links to documentation for your subscribers.

Linked datasets

A linked dataset is a read-only BigQuery dataset that serves as a symbolic link to a shared dataset. When subscribers query resources in a linked dataset, data from the shared dataset is returned. As a subscriber, a linked dataset is created inside your project when you subscribe to a shared dataset through a data clean room. No copies of the data are created, and subscribers don't see any underlying metadata, for example, view definitions.

Privacy policies

As a data contributor, you can configure privacy policies on the views shared in the data clean room. Privacy policies prevent raw access to underlying data and enforce query restrictions. Data clean rooms support the aggregation threshold privacy policy, which lets subscribers analyze data only through aggregation queries. For more information, see Prepare data.

Data egress controls

Data egress controls are automatically enabled to help prevent subscribers from copying and exporting raw data from a data clean room. Data contributors can configure additional controls to help prevent the copy and export of aggregated query results that are obtained by the subscribers.

Limitations

BigQuery data clean rooms have the following limitations:

  • You can set privacy policies only on views, not on tables. Due to this limitation, if a data contributor shares a dataset into a data clean room that contains tables (or views without privacy policies), then subscribers have raw access to the data in those resources. To enforce privacy protection on your data, you must share a dataset that only contains authorized views with an attached privacy policy.
  • As data clean rooms are built on the Analytics Hub platform, all Analytics Hub limitations apply.
  • Data clean rooms are only available in Analytics Hub regions.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, enable the Analytics Hub API, and assign the Analytics Hub Admin role to your data clean room owner (the user who will create the data clean room).

Required permissions

To get the permissions that you need to use data clean rooms, ask your administrator to grant you the BigQuery Data Editor (roles/bigquery.dataEditor) IAM role. For more information about granting roles, see Manage access.

This predefined role contains the permissions required to use data clean rooms. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use data clean rooms:

  • serviceUsage.services.get
  • serviceUsage.services.list
  • serviceUsage.services.enable

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.

Enable the Analytics Hub API

To enable the Analytics Hub API, select one of the following options:

Console

Open the Analytics Hub API page for your Google Cloud project and enable it.

Enable the Analytics Hub API

bq

Run the gcloud services enable command:

gcloud services enable analyticshub.googleapis.com

Once you enable the Analytics Hub API, you can access the Analytics Hub page.

Assign the Analytics Hub Admin role

Your data clean room owner must have the Analytics Hub Admin role (roles/analyticshub.admin). To learn how to grant this role to other users, see Create Analytics Hub administrators.

Data clean room owner workflows

As a data clean room owner, you can do the following:

  • Create a data clean room.
  • Update data clean room properties.
  • Delete a data clean room.
  • Manage data contributors.
  • Manage subscribers.
  • Share a data clean room.

Additional data clean room owner permissions

You must have the Analytics Hub Admin role (roles/analyticshub.admin) on your project to perform data clean room owner tasks.

Create a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click Create clean room (preview).

  3. For Project, select the project for the data clean room. The Analytics Hub API must be enabled for the project.

  4. Specify the location, name, primary contact, icon (optional), and description for the data clean room. Only datasets that are in the same region as the data clean room can be listed in the data clean room.

  5. Click Create clean room.

  6. Optional: In the Clean Room Permissions section, add other data clean room owners, data contributors, or subscribers.

    Create data clean room pane.

Update a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to update.

  3. In the Details tab, click Edit clean room details.

  4. Update the data clean room name, primary contact, icon, or description as needed.

  5. Click Save.

Delete a data clean room

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to delete, click More actions > Delete.

  3. To confirm, enter delete, and then click Delete. You cannot undo this action.

When you delete a data clean room, all the listings within it are deleted. However, the shared datasets and linked datasets are not deleted. The linked datasets are unlinked from the source datasets, so querying resources in the data clean room will start to fail for subscribers.

Manage data contributors

As a data clean room owner, you manage which users can add data to your data clean rooms (your data contributors). To let a user add data to a data clean room, grant them the Analytics Hub Publisher role (roles/analyticshub.publisher) on a specific data clean room:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to grant permissions to.

  3. In the Details tab, click Set permissions.

  4. Click Add principal.

  5. For New Principals, enter the usernames or emails of the data contributors that you're adding.

  6. For Select a role, select Analytics Hub > Analytics Hub Publisher.

  7. Click Save.

You can delete and update data contributors at any time by clicking Set Permissions.

You can grant the Analytics Hub Publisher role for an entire project from the IAM page, which gives a user permission to add data to any data clean room in a project. However, we don't recommend this action, as it might result in users having overly permissive access.

Manage subscribers

As a data clean room owner, you manage which users can subscribe to your data clean rooms (your subscribers). To allow a user to subscribe to a data clean room, grant them the Analytics Hub Subscriber (roles/analyticshub.subscriber) and Analytics Hub Subscription Owner (roles/analyticshub.subscriptionOwner) roles on a specific data clean room:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to grant permissions to.

  3. In the Details tab, click Set permissions.

  4. Click Add principal.

  5. For New Principals, enter the usernames or emails of the subscribers that you're adding.

  6. For Select a role, select Analytics Hub > Analytics Hub Subscriber.

  7. Click Add another role.

  8. For Select a role, select Analytics Hub > Analytics Hub Subscription Owner.

  9. Click Save.

You can delete and update subscribers at any time by clicking Set Permissions.

You can grant the Analytics Hub Subscriber and Analytics Hub Subscription Owner roles for an entire project from the IAM page, which gives a user permission to subscribe to any data clean room in a project. However, we don't recommend this action, as it might result in users having overly permissive access.

Share a data clean room

You can directly share a data clean room with subscribers:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to share, click More actions > Copy share link.

  3. Share the copied link with subscribers to let them view and subscribe to the data clean room.

Data contributor workflows

As a data contributor, you can do the following:

  • Add data to a data clean room by creating a listing.
  • Update a listing.
  • Delete a listing.
  • Share a data clean room.
  • Monitor listings.

Additional data contributor permissions

To perform data contributor tasks, you must have the Analytics Hub Publisher role (roles/analyticshub.publisher) on a data clean room.

In addition, you need the bigquery.datasets.link permission for the datasets that you want to list in a data clean room. You also need the resourcemanager.organization.get permission if you want to view data clean rooms in your organization that are not in your current project.

Create a listing (add data)

To create a listing, prepare the data with privacy policies, and then publish the data to a data clean room.

Prepare data

Data contributors generally don't want subscribers to view, query, copy, or share the raw data that they publish to a data clean room. To help prevent subscribers from accessing raw data, the data contributor must add privacy policies to every authorized view that they plan to publish to a data clean room. Privacy policies are only supported on views, not tables.

To prepare a dataset to be added to a data clean room, do the following:

  1. Create a dataset to be added to the data clean room.
  2. Create authorized views in your dataset with the data that you want to add.
  3. Add privacy policies to every authorized view in your dataset. Privacy policies are not supported on tables.
  4. If your collaboration environment requires common identifiers to join data across data contributor and subscriber datasets, configure an entity resolution.

Publish data

To publish data to a data clean room as a listing, do the following:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that you want to create a listing in.

  3. Click Add data.

  4. Set the display name, dataset, primary contact, description (optional), and data egress controls for the listing.

  5. Click Next.

  6. Review the data (and privacy policies) that you're adding to the data clean room.

  7. Click Add data.

By listing a dataset in a data clean room, you grant all current and future data clean room subscribers access to the data in your shared dataset.

If you try to create a listing with a shared dataset that contains tables or views without a privacy policy, you're shown a warning that subscribers will be able access the raw data for those resources. If you confirm that you're willingly publishing such resources without privacy policies, you can still create a listing with that shared dataset.

If you get the Failed to save listing error, ensure that you have the necessary permissions to perform data contributor tasks.

Update a listing

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that contains the listing.

  3. In the row of the listing that you want to update, click More actions > Edit listing.

  4. Update the listing name, primary contact, or description as needed.

  5. Click Next.

  6. Review the listing and click Add data.

You can't change the source dataset or data egress controls for a listing after it is created.

Delete a listing

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room that contains the listing.

  3. In the row of the listing that you want to delete, click More actions > Delete listings.

  4. To confirm, enter delete, and then click Delete. You cannot undo this action.

When you delete a listing, the shared datasets and linked datasets are not deleted. The linked datasets are unlinked from the source datasets, so querying data in that listing will start to fail for subscribers.

Share a data clean room

You can directly share a data clean room with subscribers:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. In the row of the data clean room that you want to share, click More actions > Copy share link.

  3. Share the copied link with subscribers to let them view and subscribe to the data clean room.

Monitor listings

You can view the usage metrics on the source datasets that you share in a data clean room by querying the INFORMATION_SCHEMA.SHARED_DATASET_USAGE view.

To view your listing subscribers, do the following:

  1. In the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

  2. Click the display name of the data clean room.

  3. In the row of a listing that you want to view, click More actions > View subscriptions.

Subscriber workflows

A subscriber can view and subscribe to a data clean room. Subscribing to a data clean room creates one linked dataset in the subscriber's project for each listing in the data clean room. Each linked dataset has a common prefix, which is derived from the data clean room name.

You can't subscribe to a specific listing within a data clean room. You can only subscribe to the data clean room itself.

Additional subscriber permissions

You must have the Analytics Hub Subscriber (roles/analyticshub.subscriber) and Analytics Hub Subscription Owner (roles/analyticshub.subscriptionOwner) roles on a data clean room to perform subscriber tasks.

In addition, you need the bigquery.datasets.create permission in a project to create a linked dataset when you subscribe to a clean room.

Subscribe to a data clean room

Subscribing to a data clean room gives you query access to the data in the listings by creating linked datasets in your project. To subscribe to a data clean room, do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click Add.

  3. Select Analytics Hub. A discovery page opens.

  4. To display the data clean rooms that you have access to, in the filters list, select Clean rooms (Preview).

    Analytics Hub discovery page.

  5. Click the data clean room that you want to subscribe to. A description page of the data clean room opens.

  6. Click Subscribe.

  7. Select the destination project for the subscription and click Subscribe.

Linked datasets are now added to the project that you specified and are available for query.

As a subscriber, you can edit some metadata of your linked datasets, such as description and labels. You can also set permissions on your linked dataset. However, changes to linked datasets don't affect the source or shared datasets. You also can't see the source dataset metadata or view definitions.

Resources that are contained in linked datasets are read-only. As a subscriber, you can't edit data or metadata for resources in linked datasets. You also can't specify permissions for individual resources within the linked dataset.

To unsubscribe to the data clean room, delete your linked dataset.

Query data in a linked dataset

To query data in a linked dataset, use the SELECT WITH AGGREGATION_THRESHOLD syntax, which lets you run queries on privacy-policy enforced views. For an example of this syntax, see Use a privacy policy-enforced view.

Example scenario: Advertiser and publisher attribution analysis

An advertiser wants to track the effectiveness of its marketing campaigns. The advertiser has first-party data on its customers, including their purchase history, demographics, and interests. The publisher has data from its website, including which ads were shown to visitors and their conversions.

The advertiser and publisher agree to use a data clean room to combine data and measure the results of their campaigns. In this case, the publisher creates the data clean room and makes their data available for the advertiser to perform the analysis. The result is an attribution report that shows the advertiser which ads were most effective in driving sales. The advertiser can then use this information to improve its future marketing campaigns.

The advertiser and publisher orchestrate the BigQuery data clean room through the following process:

Create the data clean room (publisher)

  1. A data clean room owner in the publisher organization enables the Analytics Hub API in their BigQuery project and assigns User A as the data clean room owner (Analytics Hub Admin).
  2. User A creates a data clean room called Campaign Analysis and assigns the following permissions:
    • Data contributor (Analytics Hub Publisher): User B, a data engineer in the publisher organization.
    • Subscriber (Analytics Hub Subscriber and Subscription Owner): User C, a marketing analyst in the advertiser organization.

Add data to the data clean room (publisher)

  1. User B creates a new shared dataset with the website conversion data. The shared dataset contains authorized views with privacy policies.
  2. User B creates a new listing in the data clean room called Publisher Conversion Data.

Subscribe to the data clean room (advertiser)

  1. User C subscribes to the data clean room, which creates a linked dataset for the Publisher Conversion Data listing.
  2. User C can now run aggregation queries to combine the data from this linked dataset with their first-party data to measure the campaign effectiveness.

Entity resolution with LiveRamp

Data clean room use cases often require linking entities across data contributor and subscriber datasets that don't include a common identifier. Subscribers and data contributors might represent the same records differently in multiple datasets, either because datasets originate from different data sources or because datasets use identifiers from different namespaces.

As a part of data preparation, entity resolution in BigQuery does the following:

  • For data contributors, it deduplicates and resolves records in their shared datasets by using identifiers from a common provider of their choice. This process enables cross-contributor joins.
  • For subscribers, it deduplicates and resolves records in their first-party datasets and links to entities in data contributor datasets. This process enables joins between subscriber and data contributor datasets.

Entity resolution in BigQuery is supported through an embedded integration with LiveRamp.

LiveRamp uses a process called RampID Transcoding, which transcodes a RampID (the pseudonymous person-based identifier) in one domain to a RampID in another domain. Transcoding translates the RampID for use by another party.

Prerequisites

Setup

The following steps are required when you use LiveRamp Embedded Identity for the first time. After setup is complete, only the input table and metadata table need to be modified between runs.

Create an input dataset and an output dataset

This is an optional step. If you want to ensure that your data is fully isolated for interaction with LiveRamp and control what the LiveRamp service account can access, we recommend that you create two new datasets: an input dataset and an output dataset. The input dataset holds the input table and the metadata table. The output dataset is for LiveRamp to write to.

Create an input table

Create a table (within the input dataset, if you created one) with the RampIDs, the target domain, and the target type. For details and examples, see Input Table Columns and Descriptions.

Create a metadata table

The metadata table is used to control the execution of LiveRamp Embedded Identity on BigQuery. Create a table (within the input dataset, if you created one) with the client IDs, execution mode, target domain, and target type. For details and examples, see Metadata Table Columns and Descriptions.

Share tables with LiveRamp

Grant the LiveRamp Google Cloud service account access to view and process data in your input dataset. For details and examples, see Share Tables and Datasets with LiveRamp.

Run an embedded identity job

To run an embedded identity job with LiveRamp in BigQuery, do the following:

  1. Insert all the RampIDs that were encoded in your domain into your input table.
  2. Confirm that your metadata table is still accurate before you run the job.
  3. Contact LiveRampIdentitySupport@liveramp.com with a job process request. Include the project ID, dataset ID, and table ID (if applicable) for your input table, metadata table, and output dataset. For more information, see Notify LiveRamp to Initiate Transcoding.

Results are generally delivered to your output dataset within 3 business days.

LiveRamp Support

For support issues, contact LiveRampIdentitySupport@liveramp.com.

Discover data clean room assets

To find all the data clean rooms that you have access to, do the following:

  • For data clean room owners and data contributors, in the Google Cloud console, go to the Analytics Hub page.

    Go to Analytics Hub

    All the data clean rooms that you can access are listed.

  • For subscribers, do the following:

    1. In the Google Cloud console, go to the BigQuery page.

      Go to BigQuery

    2. In the Explorer pane, click Add.

    3. Select Analytics Hub. A discovery page opens.

    4. To display the data clean rooms that you have access to, in the filters list, select Clean rooms (Preview).

To find all the linked datasets created by data clean rooms in your project, run the following command in a command-line environment:

PROJECT=PROJECT_ID \
for dataset in $(bq ls --project_id $PROJECT | tail +3); \
do [ "$(bq show -d --project_id $PROJECT $dataset | egrep LINKED)" ] \
&& echo $dataset; done

Replace PROJECT_ID with the project that contains your linked datasets.

Pricing

Data contributors are only charged for data storage. Subscribers are only charged for compute (analysis) when they run queries.

Subscribers must have non-edition offerings or Enterprise Plus edition.

Billing for entity resolution with LiveRamp is done by LiveRamp.

What's next