Jump to Content
Data Analytics

Automate data governance, extend your data fabric with Dataplex-BigLake integration

December 16, 2022
Uri Gilad

Group Product Manager - Data Governance, Google Cloud

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Unlocking the full potential of data requires breaking down the silo between open-source data formats and data warehouses. At the same time, it is critical to enable data governance team to apply policies regardless of where the data happens, whether - on file  or columnar storage. 

Today,  data governance teams have to become subject matter experts on each storage system the corporate data happens to reside on. Since February 2022,  Dataplex has offered a unified  place to apply policies, which are propagated across both lake storage and data warehouses in GCP. Rather than specifying policies in multiple places, bearing the cognitive load of translating policies from “what you want the storage system to do” to “how your data should behave” Dataplex offers a single point for unambiguous policy management.  Now, we are making it easier for you to use BigLake.  

Earlier this year, we launched BigLake into general availability, BigLake unifies data fabric between Data Lakes and Data Warehouses by extending BigQuery storage to open file formats. Today, we announce BigLake Integration with Dataplex (available in preview). This integration eliminates the configuration steps for the admin taking advantage of BigLake and managing policies across GCS and BigQuery from a unified console. 

Previously,  you could point Dataplex at a Google Cloud Storage (GCS) bucket, and Dataplex will discover and extract all metadata from the data lake and register this metadata in BigQuery (and Dataproc Metastore, Data Catalog) for analysis and search. With the BigLake integration capability, we are building on this capability by allowing an “upgrade” of a bucket asset, and instead of just creating external tables in BigQuery for analysis - Dataplex will create policy-capable BigLake tables! 

The immediate implication is that admins can now assign column, row, and table policies to the BigLake tables auto-created by Dataplex, as with BigLake - the infrastructure (GCS) layer is separate from the analysis layer (BigQuery). Dataplex will handle the creation of a BigQuery connection and a BigQuery publishing dataset and ensure the BigQuery service account has the correct permissions on the bucket.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Dataplex-BigLake_121622.max-900x900.jpg

But wait - there’s more.

With this release of Dataplex, we are also introducing advanced logging called governance logs.  Governance logs allow tracking the exact state of policy propagation to tables and columns - adding an additional level of detail going beyond the high-level “status” for the bucket and into fine-grained status and logs for tables, columns. 

What’s next? 

  • We have updated our documentation for managing buckets and have additional detail regarding policy propagation and the upgrade process.

  • Stay tuned for an exciting  roadmap ahead, with more automation around policy management.

For more information, please visit:

Posted in