Jump to Content
Data Analytics

Get started with differential privacy and privacy budgeting in BigQuery data clean rooms

April 5, 2024
Magda Gianola

Group Product Manager, Google Cloud

Anurag Peshne

Software Engineer, Google Cloud

Try Gemini 1.5 models

Google's most advanced multimodal models in Vertex AI

Try it

We are excited to announce that differential privacy enforcement with privacy budgeting is now available in BigQuery data clean rooms to help organizations prevent data from being reidentified when it is shared.

Differential privacy is an anonymization technique that limits the personal information that is revealed in a query output. Differential privacy is considered to be one of the strongest privacy protections that exists today because it:

  • is provably private
  • supports multiple differentially private queries on the same dataset
  • can be applied to many data types

Differential privacy is used by advertisers, healthcare companies, and education companies to perform analysis without exposing individual records. It is also used by public sector organizations that comply with the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Family Educational Rights and Privacy Act (FERPA), and the California Consumer Privacy Act (CCPA).

What can I do with differential privacy?

With differential privacy, you can:

  • protect individual records from re-identification without moving or copying your data
  • protect against privacy leak and re-identification
  • use one of the anonymization standards most favored by regulators

BigQuery customers can use differential privacy to:

  • share data in BigQuery data clean rooms while preserving privacy
  • anonymize query results on AWS and Azure data with BigQuery Omni
  • share anonymized results with Apache Spark stored procedures and Dataform pipelines so they can be consumed by other applications
  • enhance differential privacy implementations with technology from Google Cloud partners Gretel.ai and Tumult Analytics
  • call frameworks like PipelineDP.io

So what is BigQuery differential privacy exactly?

BigQuery differential privacy is three capabilities:

  • Differential privacy in GoogleSQL – You can use differential privacy aggregate functions directly in GoogleSQL

  • Differential privacy enforcement in BigQuery data clean rooms – You can apply a differential privacy analysis rule to enforce that all queries on your shared data use differential privacy in GoogleSQL with the parameters that you specify

  • Parameter-driven privacy budgeting in BigQuery data clean rooms – When you apply a differential privacy analysis rule, you also set a privacy budget to limit the data that is revealed when your shared data is queried. BigQuery uses parameter-driven privacy budgeting to give you more granular control over your data than query thresholds do and to prevent further queries on that data when the budget is exhausted.

BigQuery differential privacy enforcement in action

Here’s how to enable the differential privacy analysis rule and configure a privacy budget when you add data to a BigQuery data clean room.

https://storage.googleapis.com/gweb-cloudblog-publish/images/figure_1.max-2200x2200.png

Subscribers of that clean room must then use differential privacy to query your shared data.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_NP4i0TM.max-1200x1200.png

Subscribers of that clean room cannot query your shared data once the privacy budget is exhausted.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_gO4XfWR.max-1200x1200.png

Get started with BigQuery differential privacy

BigQuery differential privacy is configured when a data owner or contributor shares data in a BigQuery data clean room. A data owner or contributor can share data using any compute pricing model and does not incur compute charges when a subscriber queries that data. Subscribers of a data clean room incur compute charges when querying shared data that is protected with a differential privacy analysis rule. Those subscribers are required to use on-demand pricing (charged per TB) or the Enterprise Plus edition (charged per slot hour).

Create a clean room where all queries are protected with differential privacy today and let us know where you need help.

Posted in