Lakehouse for Apache Iceberg (formerly BigLake)

Open, cross-cloud lakehouse for the agentic era

Enterprise storage, governance, and performance to build scalable analytical, operational, and real-time AI use cases on a unified, cross-cloud, and multimodal open lakehouse.

Apache Iceberg is a trademark of The Apache Software Foundation.



Features

Fully managed Iceberg with read/write interoperability

Apache Iceberg tables, managed using the Lakehouse Iceberg REST catalog, provide read and write interoperability between BigQuery and Google Cloud Managed Service for Apache Spark as well as Iceberg-compatible OSS engines such as Spark, Trino and Flink, and now with third-party engines like Snowflake and Databricks (Preview). This helps you easily connect your Iceberg tables directly to engines like BigQuery and Google managed Spark so you can accelerate your AI workloads.

Google’s AI connected to your cross-cloud Iceberg data

Leverage cross-cloud interconnect and caching (Preview) to get fast, low-latency access to S3 Iceberg data. Run BigQuery, Spark, and Gemini Enterprise through conversational analytics API jobs on AWS data with price-performance characteristics comparable to native data platform solutions. Plus, new Lakehouse runtime catalog federation (Preview) seamlessly unites your ecosystem, letting BigQuery and Google managed Spark discover and analyze enterprise data across Snowflake, Databricks, and AWS Glue.

Price-performance acceleration for Iceberg

BigQuery’s enhanced vectorized execution is now default for Lakehouse Iceberg REST Catalog tables as well as Iceberg and Parquet tables in BigQuery catalog. Offload routine Iceberg maintenance like compaction, clustering, and garbage collection directly to Google Lakehouse. New automated features—including table management, partitioning, clustering, and history-based optimization (GA for Iceberg tables in BigQuery catalog; Preview for REST catalog)—accelerate price-performance with zero manual overhead.

Differentiated BigQuery and Spark

Power real-time insights with Iceberg using BigQuery streaming for high-throughput ingestion with zero-read latency. Build complex processing pipelines with multi-statement transactions and BigQuery change data replication to Iceberg tables (GA for BigQuery catalog; Preview for REST catalog). Unlock multimodal, vector, and graph analytics by uniting structured and unstructured data using BigQueryObjectRefs. Supercharge Spark data science workloads with Lightning Engine with up to 4.5x faster performance. 

Real-time context and governance for agents

Power AI agents with real-time transactional data. Stream operational data from Spanner, AlloyDB, and Cloud SQL into BigQuery and managed Iceberg tables for instant analysis, and push these analytical insights directly back into AlloyDB or Spanner, to serve them with sub-millisecond, high-QPS latency. Get unified governance with lineage, profiling and data quality through the Knowledge Catalog (formerly Dataplex) integration. Map transactional, unstructured and Iceberg data to your business logic, giving your agents the deep context they need to deliver accurate, reliable, and fully governed results.

How It Works

The Lakehouse REST catalog acts as a central hub for your Iceberg tables. It provides universal read/write access across BigQuery, Managed Service for Apache Spark, OSS engines, and partners, seamlessly connecting your data to any engine to accelerate AI.

Common Uses

Lakehouse foundation and modernization

Modernize to an open, unified lakehouse architecture

Modernize your data foundation with Google’s Lakehouse. Shift legacy Hadoop to serverless Cloud Storage and unify cross-cloud data by querying Iceberg and Delta Lake directly in BigQuery. Lakehouse Iceberg REST catalog eliminates silos, offering an interoperable runtime for Spark, Trino, and Flink. With Hive catalog support, you can easily modernize Hadoop workloads to Iceberg.

    Modernize to an open, unified lakehouse architecture

    Modernize your data foundation with Google’s Lakehouse. Shift legacy Hadoop to serverless Cloud Storage and unify cross-cloud data by querying Iceberg and Delta Lake directly in BigQuery. Lakehouse Iceberg REST catalog eliminates silos, offering an interoperable runtime for Spark, Trino, and Flink. With Hive catalog support, you can easily modernize Hadoop workloads to Iceberg.

      Multi-engine interoperability

      Seamless read/write sharing between BigQuery and OSS engines

      Bring your existing Iceberg pipelines and seamlessly read or write to those tables using BigQuery or managed Spark, while easily modernizing with advanced BigQuery capabilities. Supercharge data science by running Spark ETL and BigQuery AI on the exact same Iceberg tables with zero data movement. Build conversational analytics agents in BigQuery that work with your data in S3.

        Seamless read/write sharing between BigQuery and OSS engines

        Bring your existing Iceberg pipelines and seamlessly read or write to those tables using BigQuery or managed Spark, while easily modernizing with advanced BigQuery capabilities. Supercharge data science by running Spark ETL and BigQuery AI on the exact same Iceberg tables with zero data movement. Build conversational analytics agents in BigQuery that work with your data in S3.

          Bring Iceberg data into AI workflows

          Multimodal data analysis and accelerated AI workflows

          Power multimodal analysis with BigQuery AI by combining structured Iceberg tables with unstructured data using BigQuery ObjectRefs for single-SQL inference. Train Gemini Enterprise Agent Platform models using time-travel to debug data drift. Federate global REST catalogs into a unified data mesh, analyze massive-scale logs affordably, and build models directly in integrated notebooks to accelerate your AI workflows.


            Multimodal data analysis and accelerated AI workflows

            Power multimodal analysis with BigQuery AI by combining structured Iceberg tables with unstructured data using BigQuery ObjectRefs for single-SQL inference. Train Gemini Enterprise Agent Platform models using time-travel to debug data drift. Federate global REST catalogs into a unified data mesh, analyze massive-scale logs affordably, and build models directly in integrated notebooks to accelerate your AI workflows.


              Best-in-class Spark experience

              Power data science workloads across developer environments

              Unlock a frictionless Spark experience. Run SQL, Spark, and Python on a single copy of Iceberg data using unified IDEs. The new Antigravity VS Code extension acts as an AI partner to generate pipelines, debug code, and automate CI/CD from natural language. Plus, our vectorized Lightning Engine accelerates Spark execution up to 4.5x—requiring zero code changes.

                Power data science workloads across developer environments

                Unlock a frictionless Spark experience. Run SQL, Spark, and Python on a single copy of Iceberg data using unified IDEs. The new Antigravity VS Code extension acts as an AI partner to generate pipelines, debug code, and automate CI/CD from natural language. Plus, our vectorized Lightning Engine accelerates Spark execution up to 4.5x—requiring zero code changes.

                  High performance analytics with BigQuery

                  Performance optimization with BigQuery

                  Leverage BigQuery’s scale while maintaining flexible storage. Execute multi-statement transactions in BigQuery to update multiple Iceberg tables as a single atomic unit, ensuring financial-grade consistency. Use BigQuery’s advanced runtime and partitioning support for Iceberg to create partitioned/clustered tables that leverage block pruning for high-speed, cost-effective query execution.

                    Performance optimization with BigQuery

                    Leverage BigQuery’s scale while maintaining flexible storage. Execute multi-statement transactions in BigQuery to update multiple Iceberg tables as a single atomic unit, ensuring financial-grade consistency. Use BigQuery’s advanced runtime and partitioning support for Iceberg to create partitioned/clustered tables that leverage block pruning for high-speed, cost-effective query execution.

                      Real time intelligence

                      Combined transactional and analytical for agentic AI

                      Fuel event-driven AI agents by unifying your transactional and analytical data. Automate continuous CDC replication from Spanner and AlloyDB directly into Lakehouse Iceberg tables. Next, use SQL continuous queries to monitor this streaming data, instantly run AI inference, and trigger downstream actions—delivering real-time intelligence for your most critical operational workloads.

                        Combined transactional and analytical for agentic AI

                        Fuel event-driven AI agents by unifying your transactional and analytical data. Automate continuous CDC replication from Spanner and AlloyDB directly into Lakehouse Iceberg tables. Next, use SQL continuous queries to monitor this streaming data, instantly run AI inference, and trigger downstream actions—delivering real-time intelligence for your most critical operational workloads.

                          End-to-end lakehouse governance

                          Govern your lakehouse with Knowledge Catalog

                          Knowledge Catalog provides a unified governance layer by automatically discovering Iceberg tables in Cloud Storage and registering their metadata directly into the Lakehouse runtime catalog. This integration allows you to define centralized security policies ensuring consistent row- and column-level access control across both BigQuery and open-source processing engines.

                            Govern your lakehouse with Knowledge Catalog

                            Knowledge Catalog provides a unified governance layer by automatically discovering Iceberg tables in Cloud Storage and registering their metadata directly into the Lakehouse runtime catalog. This integration allows you to define centralized security policies ensuring consistent row- and column-level access control across both BigQuery and open-source processing engines.

                              Generate a solution
                              What problem are you trying to solve?
                              What you'll get:
                              Step-by-step guide
                              Reference architecture
                              Available pre-built solutions
                              This service was built with Gemini Enterprise Agent Platform. You must be 18 or older to use it. Do not enter sensitive, confidential, or personal info.

                              Pricing

                              How Lakehouse (BigLake) pricing worksLakehouse (BigLake) pricing is based on table management, metadata storage and metadata access
                              Services and usageDescriptionPrice (USD)

                              Lakehouse (BigLake)table management

                              Lakehouse (BigLake) table management compute resources used for automatic table storage optimization.

                              Starting at

                              $0.12

                              per DCU-Hour

                              Lakehouse (BigLake) metadata storage

                              Lakehouse for Apache Iceberg metastore (Lakehouse runtime catalog) charges for metadata stored. Free tier includes 1 GiB of metadata storage per month included.

                              Starting at

                              $0.04

                              per GiB per month

                              Lakehouse (BigLake) metadata access

                              Class A operations: Lakehouse (BigLake) metadata access charges for writes, updates, list, create, and config operations with a free tier of 5,000 operations per month included.

                              Starting at

                              $6.00

                              per million operations

                              Class B operations: Lakehouse (BigLake) metadata access charges for reads, get, and delete operations with a free tier of 50,000 operations per month included.

                              Starting at

                              $0.90

                              per million operations

                              How Lakehouse (BigLake) pricing works

                              Lakehouse (BigLake) pricing is based on table management, metadata storage and metadata access

                              Lakehouse (BigLake)table management

                              Description

                              Lakehouse (BigLake) table management compute resources used for automatic table storage optimization.

                              Price (USD)

                              Starting at

                              $0.12

                              per DCU-Hour

                              Lakehouse (BigLake) metadata storage

                              Description

                              Lakehouse for Apache Iceberg metastore (Lakehouse runtime catalog) charges for metadata stored. Free tier includes 1 GiB of metadata storage per month included.

                              Price (USD)

                              Starting at

                              $0.04

                              per GiB per month

                              Lakehouse (BigLake) metadata access

                              Description

                              Class A operations: Lakehouse (BigLake) metadata access charges for writes, updates, list, create, and config operations with a free tier of 5,000 operations per month included.

                              Price (USD)

                              Starting at

                              $6.00

                              per million operations

                              Class B operations: Lakehouse (BigLake) metadata access charges for reads, get, and delete operations with a free tier of 50,000 operations per month included.

                              Description

                              Starting at

                              $0.90

                              per million operations

                              Pricing calculator

                              Estimate your monthly Lakehouse costs, including region specific pricing and fees.

                              Custom quote

                              Connect with our sales team to get a custom quote for your organization.

                              Start your proof of concept

                              Start a free trial

                              Have a large project?

                              Fully managed Apache Iceberg tables

                              Use the Apache Iceberg REST catalog

                              Query Apache Iceberg data

                              Google Cloud