Using Sensitive Data Protection with BigQuery

This page contains references to pages that provide information on how to use Sensitive Data Protection with BigQuery.

Quickstart guides

Quickstart: Scheduling a Sensitive Data Protection inspection scan
Schedule periodic inspection of a Cloud Storage bucket, a BigQuery table, or a Datastore kind. For detailed instructions, see Creating and scheduling Sensitive Data Protection inspection jobs.

How-to guides

This section provides a categorized list of task-based guides that demonstrate how to use Sensitive Data Protection with BigQuery.

Inspection

Inspecting storage and databases for sensitive data
Create a one-time job that searches for sensitive data in a Cloud Storage bucket, a BigQuery table, or a Datastore kind.
Creating and scheduling Sensitive Data Protection inspection jobs
Create and schedule a job trigger that searches for sensitive data in a Cloud Storage bucket, a BigQuery table, or a Datastore kind. A job trigger automates the creation of Sensitive Data Protection jobs on a periodic basis.

Working with scan results

Sending Sensitive Data Protection scan results to Data Catalog
Scan a BigQuery table, and then send the findings to Data Catalog to automatically create tags based on Sensitive Data Protection findings.
Sending Sensitive Data Protection scan results to Security Command Center
Scan a Cloud Storage bucket, a BigQuery table, or a Datastore kind, and then send the findings to Security Command Center.
Analyzing and reporting on Sensitive Data Protection findings
Use BigQuery to run analytics on Sensitive Data Protection findings.
Querying Sensitive Data Protection findings in BigQuery
Look through sample queries that you can use in BigQuery to analyze findings that Sensitive Data Protection identified.

Re-identification risk analysis

Measuring re-identification and disclosure risk

Analyze structured data stored in a BigQuery table and compute the following re-identification risk metrics:

Computing numerical and categorical statistics

Determine minimum, maximum, and quantile values for an individual BigQuery column.

Visualizing re-identification risk using Looker Studio

Measure the k-anonymity of a dataset, and then visualize it in Looker Studio.

Tutorials

De-identify BigQuery data at query time
Follow a step-by-step tutorial that uses BigQuery remote functions to de-identify and re-identify data in real-time query results.
De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection
Review a reference architecture for creating an automated data transformation pipeline that de-identifies sensitive data like personally identifiable information (PII).

Best practices

Secure a BigQuery data warehouse that stores confidential data
Architectural overview and best practices for data governance when creating, deploying, and operating a data warehouse in Google Cloud, including data de-identification, differential handling of confidential data, and column-level access controls.

Community contributions

The following are owned and managed by community members, and not by the Sensitive Data Protection team. For questions on these items, contact their respective owners.

Create Data Catalog tags by inspecting BigQuery data with Sensitive Data Protection
Inspect BigQuery data using the Cloud Data Loss Prevention API, and then use the Data Catalog API to create column-level tags according to the sensitive elements that Sensitive Data Protection found.
Event-driven serverless scheduling architecture with Sensitive Data Protection
Set up an event-driven, serverless scheduling application that uses the Cloud Data Loss Prevention API to inspect BigQuery data.
Real-time anomaly detection using Google Cloud stream analytics and AI services
Walk through a real-time artificial intelligence (AI) pattern for detecting anomalies in log files. This proof-of-concept uses Pub/Sub, Dataflow, BigQuery ML, and Sensitive Data Protection.
Relational database import to BigQuery with Dataflow and Sensitive Data Protection
Use Dataflow and Sensitive Data Protection to securely tokenize and import data from a relational database to BigQuery. This example describes how to tokenize PII data before it's made persistent.

Pricing

When you inspect a BigQuery table, you incur Sensitive Data Protection costs, according to the storage inspection job pricing.

In addition, when you save inspection findings to a BigQuery table, BigQuery charges apply.