This page contains references to pages that provide information on how to use Sensitive Data Protection with BigQuery.
Quickstart guides
- Quickstart: Scheduling a Sensitive Data Protection inspection scan
- Schedule periodic inspection of a Cloud Storage bucket, a BigQuery table, or a Datastore kind. For detailed instructions, see Creating and scheduling Sensitive Data Protection inspection jobs.
How-to guides
This section provides a categorized list of task-based guides that demonstrate how to use Sensitive Data Protection with BigQuery.
Inspection
- Inspecting storage and databases for sensitive data
- Create a one-time job that searches for sensitive data in a Cloud Storage bucket, a BigQuery table, or a Datastore kind.
- Creating and scheduling Sensitive Data Protection inspection jobs
- Create and schedule a job trigger that searches for sensitive data in a Cloud Storage bucket, a BigQuery table, or a Datastore kind. A job trigger automates the creation of Sensitive Data Protection jobs on a periodic basis.
Working with scan results
- Sending Sensitive Data Protection scan results to Data Catalog
- Scan a BigQuery table, and then send the findings to Data Catalog to automatically create tags based on Sensitive Data Protection findings.
- Sending Sensitive Data Protection scan results to Security Command Center
- Scan a Cloud Storage bucket, a BigQuery table, or a Datastore kind, and then send the findings to Security Command Center.
- Analyzing and reporting on Sensitive Data Protection findings
- Use BigQuery to run analytics on Sensitive Data Protection findings.
- Querying Sensitive Data Protection findings in BigQuery
- Look through sample queries that you can use in BigQuery to analyze findings that Sensitive Data Protection identified.
Re-identification risk analysis
- Measuring re-identification and disclosure risk
Analyze structured data stored in a BigQuery table and compute the following re-identification risk metrics:
- Computing numerical and categorical statistics
Determine minimum, maximum, and quantile values for an individual BigQuery column.
- Visualizing re-identification risk using Looker Studio
Measure the k-anonymity of a dataset, and then visualize it in Looker Studio.
Tutorials
- De-identify BigQuery data at query time
- Follow a step-by-step tutorial that uses BigQuery remote functions to de-identify and re-identify data in real-time query results.
- De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection
- Review a reference architecture for creating an automated data transformation pipeline that de-identifies sensitive data like personally identifiable information (PII).
Best practices
- Secure a BigQuery data warehouse that stores confidential data
- Architectural overview and best practices for data governance when creating, deploying, and operating a data warehouse in Google Cloud, including data de-identification, differential handling of confidential data, and column-level access controls.
Community contributions
The following are owned and managed by community members, and not by the Sensitive Data Protection team. For questions on these items, contact their respective owners.
- Create Data Catalog tags by inspecting BigQuery data with Sensitive Data Protection
- Inspect BigQuery data using the Cloud Data Loss Prevention API, and then use the Data Catalog API to create column-level tags according to the sensitive elements that Sensitive Data Protection found.
- Event-driven serverless scheduling architecture with Sensitive Data Protection
- Set up an event-driven, serverless scheduling application that uses the Cloud Data Loss Prevention API to inspect BigQuery data.
- Real-time anomaly detection using Google Cloud stream analytics and AI services
- Walk through a real-time artificial intelligence (AI) pattern for detecting anomalies in log files. This proof-of-concept uses Pub/Sub, Dataflow, BigQuery ML, and Sensitive Data Protection.
- Relational database import to BigQuery with Dataflow and Sensitive Data Protection
- Use Dataflow and Sensitive Data Protection to securely tokenize and import data from a relational database to BigQuery. This example describes how to tokenize PII data before it's made persistent.
Pricing
When you inspect a BigQuery table, you incur Sensitive Data Protection costs, according to the storage inspection job pricing.
In addition, when you save inspection findings to a BigQuery table, BigQuery charges apply.