Introduction to materialized views
This document provides an overview of BigQuery support for materialized views. Before you read this document, familiarize yourself with BigQuery and BigQuery's logical views.
Overview
In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only changes from the base tables to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base tables. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.
The following are key characteristics of materialized views:
- Zero maintenance. Materialized views are precomputed in the background when the base tables change. Any incremental data changes from the base tables are automatically added to the materialized views, with no user action required.
- Fresh data. Materialized views return fresh data. If changes to base tables might invalidate the materialized view, then data is read directly from the base tables. If the changes to the base tables don't invalidate the materialized view, then rest of the data is read from the materialized view and only the changes are read from the base tables.
- Smart tuning. If any part of a query against the base table can be resolved by querying the materialized view, then BigQuery reroutes the query to use the materialized view for better performance and efficiency.
Use cases
Materialized views can optimize queries with high computation cost and small dataset results. Processes that benefit from materialized views include online analytical processing (OLAP) operations that require significant processing with predictable and repeated queries like those in from extract, transform, load (ETL) processes or business intelligence (BI) pipelines.
The following use cases highlight the value of materialized views. Materialized views can improve query performance if you frequently require the following:
- Pre-aggregate data. Aggregation of streaming data.
- Pre-filter data. Run queries that only read a particular subset of the table.
- Pre-join data. Query joins, especially between large and small tables.
- Recluster data. Run queries that would benefit from a clustering scheme that differs from the base tables.
Comparison to other BigQuery techniques
The following table summarizes the similarities and differences between BigQuery caching, scheduled queries, logical views, and materialized views.
Component | Caching | Scheduled queries | Logical views | Materialized views |
---|---|---|---|---|
Optimize compute | Yes | No | No | Yes |
Query support | All | All | All | Limited1 |
Partitioning and clustering | No | Yes | N/A | Yes |
Incremental refresh | No | No | No | Yes |
Additional storage | No | Yes | No | Yes |
Query rewrite | No | No | No | Yes |
Maintenance costs | No | Yes | N/A | Yes |
Data staleness | Never | Yes | Never | Optional 2 |
1 The --allow_non_incremental_definition
option
supports an expanded range of SQL queries to create materialized views.
2 The --max_staleness
option
provides consistently high performance with controlled costs when
processing large, frequently changing datasets.
Interaction with other BigQuery features
The following BigQuery features work transparently with materialized views:
Query plan explanation: The query plan reflects which materialized views are scanned (if any), and shows how many bytes are read from the materialized views and base tables combined.
Query caching: The results of a query that BigQuery rewrites using a materialized view can be cached subject to the usual limitations (using of deterministic functions, no streaming into the base tables, etc.).
Cost restriction: If you have set a value for maximum bytes billed, and a query would read a number of bytes beyond the limit, the query fails without incurring a charge, whether the query uses materialized views, the base tables, or both.
Cost estimation using dry run: A dry run repeats query rewrite logic using the available materialized views and provides a cost estimate. You can use this feature as a way to test whether a specific query uses any materialized views.
BigLake metadata cache-enabled tables
Materialized views over BigLake metadata cache-enabled tables can reference structured data stored in Cloud Storage and Amazon Simple Storage Service (Amazon S3). These materialized views function like materialized views over BigQuery-managed storage tables, including the benefits of automatic refresh and smart tuning. Other benefits include the pre-aggregating, pre-filtering, and pre-joining of data stored outside of BigQuery. Materialized views over BigLake tables are stored in and have all of the characteristics of BigQuery managed storage.
When you create a materialized view over an Amazon S3 BigLake table, the data in the materialized view isn't available for joins with BigQuery data. To make Amazon S3 data in a materialized view available for joins, create a replica of the materialized view. You can only create materialized view replicas over authorized material views.
Materialized view replicas
BigQuery lets you create materialized views on BigLake metadata cache-enabled tables over Amazon Simple Storage Service (Amazon S3), Apache Iceberg, or Salesforce Data Cloud data.
A materialized view replica lets you use the Amazon S3, Iceberg, or Data Cloud materialized view data in queries while avoiding data egress costs and improving query performance. A materialized view replica does this by replicating the Amazon S3, Iceberg, or Data Cloud data to a dataset in a supported BigQuery region, so that the data is available locally in BigQuery.
Learn how to create materialized view replicas.
Data freshness
After you create the materialized view replica, the replication process
polls the source materialized view for changes and replicates data to the
materialized view replica. The data is replicated at the interval you specified
in the replication_interval_seconds
option of the
CREATE MATERIALIZED VIEW AS REPLICA OF
statement.
In addition to the replication interval, the freshness of the materialized view replica data is also affected by how often the source materialized view refreshes, and how often the metadata cache of the Amazon S3, Iceberg, or Data Cloud table used by the materialized view refreshes.
You can check the data freshness for the materialized view replica and the resources it is based on by using the Google Cloud console:
- For materialized view replica freshness, look at the Last modified field in the materialized view replica's Details pane.
- For source materialized view freshness, look at the Last modified field in the materialized view's Details pane.
- For source Amazon S3, Iceberg, or Data Cloud table metadata cache freshness, look at the Max staleness field in the materialized view's Details pane.
Supported regions
Use the location mappings in the following table when creating materialized view replicas:
Location of the source materialized view | Location of the materialized view replica |
---|---|
aws-us-east-1 |
The US
multi-region,
or any of the following
regions:
|
aws-us-west-2 |
The US
multi-region,
or any of the following
regions:
|
aws-eu-west-1 |
The EU
multi-region,
or any of the following
regions:
|
aws-ap-northeast-2 |
Any of the following
regions:
|
aws-ap-southeast-2 |
Any of the following
regions:
|
Limitations
- Limits on base table references and other restrictions might apply. For more information about materialized view limits, see Quotas and limits.
- The data of a materialized view cannot be updated or manipulated directly
using operations such as
COPY
,EXPORT
,LOAD
,WRITE
, or data manipulation language (DML) statements. - You cannot replace an existing materialized view with a materialized view of the same name.
- The view SQL cannot be updated after the materialized view is created.
- A materialized view must reside in the same organization as its base tables, or in the same project if the project does not belong to an organization.
- Only materialized views from the same dataset are considered for smart tuning.
- Materialized views use a restricted SQL syntax and a limited set of aggregation functions. For more information, see Supported materialized views.
- Materialized views cannot be nested on other materialized views.
- Materialized views cannot query external or wildcard tables, logical views1, snapshots, or change data capture-enabled tables.
- Only the GoogleSQL dialect is supported for materialized views.
- You can set descriptions for materialized views, but you cannot set descriptions for the individual columns in the materialized view.
- If you delete a base table without first deleting the materialized view, queries and refreshes of the materialized view fail. If you recreate the base table, you must also recreate the materialized view.
1Logical view reference support is in preview. For more information, see Reference logical views.
Limitations of materialized views over BigLake tables
- Partitioning of the materialized view is not supported. The base tables can use hive partitioning but the materialized view storage cannot be partitioned in BigLake tables. This means that any deletion in a base table causes a full refresh of the materialized view. For more details see Incremental updates.
- The
-max_staleness
option value of the materialized view must be greater than that of the BigLake base table. - Joins between BigQuery managed tables and BigLake tables are not supported in a single materialized view definition.
Limitations of materialized view replicas
- You can't create materialized view replicas for materialized views that are based on any tables that use row-level security or column-level security.
- You can't use customer-managed encryption keys (CMEKs) with either the source materialized view or the materialized view replica.
- You can only create materialized view replicas for materialized views that are based on any tables that use metadata caching.
- You can create only one materialized view replica for a given source materialized view.
- You can only create materialized view replicas for authorized materialized views.
Materialized views pricing
Costs are associated with the following aspects of materialized views:
- Querying materialized views.
- Maintaining materialized views, such as when materialized views are refreshed. The cost for automatic refresh is billed to the project where the view resides. The cost for manual refresh is billed to the project in which the manual refresh job is run. For more information about controlling maintenance cost, see Refresh job maintenance.
- Storing materialized view tables.
Component | On-demand pricing | Capacity-based pricing |
---|---|---|
Querying | Bytes processed by materialized views and any necessary portions of the base tables.1 | Slots are consumed during query time. |
Maintenance | Bytes processed during refresh time. | Slots are consumed during refresh time. |
Storage | Bytes stored in materialized views. | Bytes stored in materialized views. |
1Where possible, BigQuery reads only the changes since the last time the view was refreshed. For more information, see Incremental updates.
Storage cost details
For AVG
, ARRAY_AGG
, and APPROX_COUNT_DISTINCT
aggregate values in a
materialized view, the final value is not directly stored. Instead,
BigQuery internally stores a materialized view as an intermediate
sketch, which is used to produce the final value.
As an example, consider a materialized view that's created with the following command:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT date, AVG(net_paid) AS avg_paid FROM project-id.my_dataset.my_base_table GROUP BY date
While the avg_paid
column is rendered as NUMERIC
or FLOAT64
to the user,
internally it is stored as BYTES
, with its content being an intermediate
sketch in proprietary format. For data size calculation,
the column is treated as BYTES
.
Materialized view replica costs
Use of materialized view replicas incurs compute, outbound data transfer, and storage costs.