BigLake is a storage engine that provides a unified interface for analytics and AI engines to query multiformat, multicloud, and multimodal data in a secure, governed, and performant manner. Build a single-copy AI lakehouse designed to reduce management of and need for custom data infrastructure.
Continuous innovation including new research BigQuery's Evolution toward a Multi-Cloud Lakehouse to be presented at the 2024 SIGMOD event.
Deploy a Google-recommended solution that unifies data lakes and data warehouses for storing, processing, and analyzing both structured and unstructured data
Store a single copy of structured and unstructured data and query using analytics and AI
Fine-grained access control and multicloud governance over distributed data
Fully managed experience with automatic data management for your open-format lakehouse
Benefits
Freedom of choice
Unlock analytics on distributed data regardless where and how it’s stored, while choosing the best analytics tools, open source or cloud native over a single copy of data.
Secure and performant data lakes
Fine-grained access control across open source engines like Apache Spark, Presto and Trino, and open formats such as Parquet. Performant queries over data lakes powered by BigQuery.
Unified governance & management at scale
Integrates with Dataplex to provide management at scale, including logical data organization, centralized policy & metadata management, quality and lifecycle management for consistency across distributed data.
Key features
BigLake eliminates the need to grant file level access to end users. Apply table, row, column level security policies on object store tables similar to existing BigQuery tables.
Maintain a single copy of structured and unstructured data and make it uniformly accessible across Google Cloud and open source engines, including BigQuery, Vertex AI, Dataflow, Spark, Presto, Trino, and Hive using BigLake connectors. Centrally manage security policies in one place, and have it consistently enforced across the query engines by the API interface built into the connectors.
Discover all BigLake tables, including those defined over Amazon S3, Azure data lake Gen 2 in Data Catalog. Configure fine grained access control and have it enforced across clouds when querying with BigQuery Omni.
Object tables enable use of multimodal data for governed AI workloads. Easily build AI use cases using BigQuery SQL and its Vertex AI integrations.
Supports open table and file formats including Parquet, Avro, ORC, CSV, JSON. The API serves multiple compute engines through Apache Arrow. Table format natively supports Apache Iceberg, Delta, and Hudi via manifest.
What's new
Documentation
Introduce BigLake concepts and learn what it can do for you to simplify your analytics experience.
Learn how to create and manage BigLake tables, query a BigLake table through BigQuery or other open source engines using connectors.
Learn how to query data stored in a Cloud Storage BigLake table.
Pricing
BigLake pricing is based on querying BigLake tables, including:
1. BigQuery pricing applies for queries over BigLake tables defined on Google Cloud Storage.
2. BigQuery Omni pricing applies for queries over BigLake tables defined on Amazon S3 & Azure data lake Gen 2.
3. Queries from open-source engines using BigLake connectors: BigLake connectors use BigQuery Storage API, and corresponding prices apply - billed on bytes read, and Egress.
4. Additional costs apply for query acceleration using metadata caching, object tables, and BigLake Metastore.
Ex: * The first 1 TB of data processed with BigQuery each month is free.
Start building on Google Cloud with $300 in free credits and 20+ always free products.