Stay organized with collections Save and categorize content based on your preferences.
Jump to

BigLake

BigLake is a storage engine that unifies data warehouses and lakes by enabling BigQuery and open source frameworks like Spark to access data with fine-grained access control. BigLake provides accelerated query performance across multi-cloud storage and open formats such as Apache Iceberg.

  • Store a single copy of data with uniform features across data warehouses & lakes.

  • Fine-grained access control and multi-cloud governance over distributed data.

  • Seamless integration with open source analytics tools and open data formats.

Benefits

Freedom of choice

Unlock analytics on distributed data regardless where and how it’s stored, while choosing the best analytics tools, open source or cloud native over a single copy of data. 

Secure and performant data lakes

Fine-grained access control across open source engines like Apache Spark, Presto and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery.

Unified governance & management at scale

Integrates with Dataplex to provide management at scale, including logical data organization, centralized policy & metadata management, quality and lifecycle management for consistency across distributed data. 

Key features

Key features

Fine grained security controls

BigLake eliminates the need to grant file level access to end users. Apply table, row, column level security policies on object store tables similar to existing BigQuery tables.

Multi-compute analytics

Maintain a single copy of data and make it uniformly accessible across Google Cloud and open-source engines, including BigQuery, Vertex AI, Dataflow, Spark, Presto, Trino, and Hive using BigLake connectors. Centrally manage security policies in one place, and have it consistently enforced across the query engines by the API interface built into the connectors.

Multi-cloud governance

Discover all BigLake tables, including those defined over Amazon S3, Azure data lake Gen 2 in Data Catalog. Configure fine grained access control and have it enforced across clouds when querying with BigQuery Omni.

Performance acceleration

Achieve industry leading performance over data lake tables on Google Cloud, AWS and Azure, powered by proven BigQuery infrastructure.

Built on open formats

Gain access to the most popular open data formats including Parquet, Avro, ORC, CSV, JSON. The API serves multiple compute engines through Apache Arrow.

logo for bol.com

"As a rapidly growing e-commerce company, we have seen rapid growth in data. BigLake allows us to unlock the value of data lakes by enabling access control on our views while providing a unified interface to our users and keeping data storage costs low. This in turn allows quicker analysis on our datasets by our users."

Documentation

Documentation

Google Cloud Basics
Introduction to BigLake

Introduce BigLake concepts and learn what it can do for you to simplify your analytics experience.

Quickstart
Getting started with BigLake

Learn how to create and manage BigLake tables, query a BigLake table through BigQuery or other open source engines using connectors.

Pricing

Pricing

BigLake pricing is based on querying BigLake tables, including:

1. BigQuery pricing applies for queries over BigLake tables defined on Google Cloud Storage. 

2. BigQuery Omni pricing applies for queries over BigLake tables defined on Amazon S3 & Azure data lake Gen 2.

3. Queries from open-source engines using BigLake connectors: BigLake connectors use BigQuery Storage API, and corresponding prices apply - billed on bytes read, and Egress.

Ex: * The first 1 TB of data processed with BigQuery each month is free.