BigLake
BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg.
-
Store a single copy of data with uniform features across data warehouses & lakes.
-
Fine-grained access control and multi-cloud governance over distributed data.
-
Seamless integration with open source analytics tools and open data formats.
Benefits
Freedom of choice
Unlock analytics on distributed data regardless where and how it’s stored, while choosing the best analytics tools, open source or cloud native over a single copy of data.
Secure and performant data lakes
Fine-grained access control across open source engines like Apache Spark, Presto and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery.
Unified governance & management at scale
Integrates with Dataplex to provide management at scale, including logical data organization, centralized policy & metadata management, quality and lifecycle management for consistency across distributed data.
Key features
Key features
Fine grained security controls
BigLake eliminates the need to grant file level access to end users. Apply table, row, column level security policies on object store tables similar to existing BigQuery tables.
Multi-compute analytics
Maintain a single copy of data and make it uniformly accessible across Google Cloud and open-source engines, including BigQuery, Vertex AI, Dataflow, Spark, Presto, Trino, and Hive using BigLake connectors. Centrally manage security policies in one place, and have it consistently enforced across the query engines by the API interface built into the connectors.
Multi-cloud governance
Discover all BigLake tables, including those defined over Amazon S3, Azure data lake Gen 2 in Data Catalog. Configure fine grained access control and have it enforced across clouds when querying with BigQuery Omni.
Performance acceleration
Achieve industry leading performance over data lake tables on Google Cloud, AWS and Azure, powered by proven BigQuery infrastructure.
Built on open formats
Gain access to the most popular open data formats including Parquet, Avro, ORC, CSV, JSON. The API serves multiple compute engines through Apache Arrow.
"As a rapidly growing e-commerce company, we have seen rapid growth in data. BigLake allows us to unlock the value of data lakes by enabling access control on our views while providing a unified interface to our users and keeping data storage costs low. This in turn allows quicker analysis on our datasets by our users."
What's new
What’s new
Documentation
Documentation
Introduction to BigLake
Introduce BigLake concepts and learn what it can do for you to simplify your analytics experience.
Getting started with BigLake
Learn how to create and manage BigLake tables, query a BigLake table through BigQuery or other open source engines using connectors.
Pricing
Pricing
BigLake pricing is based on querying BigLake tables,
including: 1.
BigQuery pricing
applies for queries over BigLake tables defined on Google
Cloud Storage. 2.
BigQuery Omni pricing applies
for queries over BigLake tables defined on Amazon S3 &
Azure data lake Gen 2. 3. Queries from open-source engines using BigLake
connectors: BigLake connectors
use BigQuery Storage API, and corresponding prices apply
- billed on bytes read, and Egress.
Ex: * The first 1 TB of data processed with BigQuery each month is free.