This page provides an overview of BigQuery Materialized Views.
Introduction
Materialized views are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.
The following are key characteristics of BigQuery Materialized Views:
Zero maintenance: A materialized view is recomputed in background once the base table has changed. All incremental data changes from the base tables are automatically added to the materialized views. No user inputs are required.
Always fresh: A materialized view is always consistent with the base table, including BigQuery streaming tables. If a base table is modified via update, merge, partition truncation, or partition expiration, BigQuery will invalidate the impacted portions of the materialized view and fully re-read the corresponding portion of the base table. For an unpartitioned materialized view, BigQuery will invalidate the entire materialized view and re-read the entire base table. For a partitioned materialized view, BigQuery will invalidate the affected partitions of the materialized view and re-read the entire corresponding partitions from the base table. Partitions that are append-only are not invalidated and are read in delta mode. In other words, there will never be a situation when querying a materialized view results in stale data.
Smart tuning: If a query or part of a query against the source table can instead be resolved by querying the materialized view, BigQuery will rewrite (reroute) the query to use the materialized view for better performance and/or efficiency.
Benefits
BigQuery Materialized Views provides the following benefits:
Reduction in the execution time and cost for queries with aggregate functions. The largest benefit is gained when a query's computation cost is high and the resulting dataset is small.
Automatic and transparent BigQuery optimization because the optimizer uses a materialized view, if available, to improve the query execution plan. This optimization does not require any changes to the queries.
The same resilience and high availability as BigQuery tables.
Use cases
BigQuery Materialized Views targets the following use cases:
Faster query performance: If you have a raw table and you perform online analytical processing (OLAP) style aggregation that requires significant processing, and you have predictable and repeated queries, such as from extract, transform, load (ETL) or business intelligence (BI) pipelines, you should use BigQuery Materialized Views.
Aggregation of real-time data: If you want access to data for decisions in real-time, BigQuery natively supports streaming capabilities. BigQuery Materialized Views integrates with BigQuery streaming and performs aggregations in real-time to provide up-to-date information.
Creating materialized views
You can create a materialized view through the Cloud Console, the bq
command-line tool, or the
BigQuery API. As an example, in the Cloud Console you can use the following DDL
statement:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1
For more details on how to use the Cloud Console, the bq
command-line tool, or the
BigQuery API to create a materialized view, see
Create and use materialized views.
Costs
Costs are associated with BigQuery Materialized Views for the following components:
- Querying materialized views
- Maintaining materialized views, such as when the view is refreshed
- Storing materialized view tables
Component | On-demand pricing | Flat-rate pricing |
---|---|---|
Querying | Bytes processed by the materialized view and any necessary portion of the base table. 1 | No extra cost. Slots are consumed during query time. |
Maintenance | Bytes processed. Slots are consumed for the incremental view refresh. | No extra cost. Slots are consumed during query time. |
Storage | Bytes stored in a materialized view | Bytes stored in a materialized view |
1 If the base table is append-only, then only the delta since the last view refresh will be processed from the base table. For more information, see Incremental updates.
Storage cost details
For AVG
, ARRAY_AGG
, APPROX_COUNT_DISTINCT
aggregate values in a
materialized view, the final value is not directly stored. Instead,
BigQuery internally stores a materialized view as an intermediate
sketch used to produce the final value.
As an example, consider a materialized view created with the following command:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT date, AVG(net_paid) AS avg_paid FROM project-id.my_dataset.my_base_table GROUP BY date
While avg_paid
column will be rendered as NUMERIC
or FLOAT64
to the user,
internally it will be stored as BYTES
, with its content being an intermediate
sketch in proprietary format. For the purpose of data size calculation, the
column will be treated as BYTES
.
Access control
Access granted to a materialized view is set at the dataset level, the view level, or the column level. You can also set access at a higher level in the IAM resource hierarchy.
Monitoring materialized views
You can determine information about materialized views and materialized view refresh jobs by using the BigQuery API.
For discovery of materialized views, list them by calling the tables.list method. To retrieve the schema and other properties of a table, call the tables.get method.
For discovery of materialized view refresh jobs, list them by calling the
jobs.list method. To retrieve details about the jobs, call the
jobs.get method. Automatic refresh jobs contain the
materialized_view_refresh
prefix within the job ID, and they are started
by a BigQuery admin account.
Comparison to other BigQuery techniques
The following table summarizes the similarities and difference between BigQuery caching, scheduled queries, standard views, and materialized views.
Component | Caching | Scheduled queries | Standard views | Materialized views |
---|---|---|---|---|
Performance | Yes | No | No | Yes |
Query supported | All | All | All | Aggregation — filtering and grouping |
Partitioning and clustering | No | Yes | NA | Yes |
Incremental refresh | No | No | No | Yes |
Additional storage | No | Yes | No | Yes |
Query rewrite | No | No | No | Yes |
Maintenance costs | No | Yes | NA | Yes |
Data staleness | Never | Yes | Never | Never |
Limited to a single table | No | No | No | Yes |
Interaction with other BigQuery features
The following BigQuery features work transparently with materialized views:
Query plan explanation: The query plan will reflect which materialized views were scanned (if any), and will show how many bytes were read from the materialized view and base table combined.
Query caching: Results of a query that gets rewritten using a materialized view can be cached subject to the usual limitations (using of deterministic functions, no streaming into the base table, etc.).
Cost calculation: With on-demand billing, if the query was rewritten using materialized view(s) and ended up scanning less bytes than scanning the base table, the query will end up being cheaper.
Cost restriction: If you have set a value for maximum bytes billed, and a query will read bytes beyond the limit, the query fails without incurring a charge, whether the query used the materialized view, the base table, or both.
Cost estimation via dry run: A dry run repeats query rewrite logic using the available materialized views and should provide an accurate cost estimate. You can use it as a way to test whether a specific query utilizes any materialized views.
Limitations
You cannot directly manipulate, or read raw content from, a materialized view. This means the following actions are not supported:
- Copying a materialized view, either as a source or destination of a copy job.
- Exporting a materialized view.
- Loading data into a materialized view
- Writing a query result into a materialized view
- Running DML statements over a materialized view.
A materialized view is required to be in the same dataset as the base table.
Materialized views support a restricted SQL syntax and a limited set of aggregation functions. For more information, see Supported materialized views.
A materialized view can reference only a single table, and cannot use joins.
Materialized views cannot be nested on other materialized views.
You can use only the standard SQL dialect for materialized views.
You can create a maximum of 20 materialized views per base table.
SHOW
DDL statements are not supported on materialized viewsALTER
DDL statements, other thanALTER
withSET OPTIONS
, are not supported on materialized viewsIf you delete the base table without first deleting the materialized view, queries over the materialized view will fail, as will refreshes. If you decide to re-create the base table, you will also need to recreate the materialized view.
Protecting data using Cloud Key Management Service keys is not supported. You can't protect the data in a materialized view using Cloud KMS keys, nor can you create a materialized view over a base table protected by Cloud KMS keys.
ARRAY_AGG considerations
Selecting ARRAY_AGG
columns from a materialized view carry the same
limitations as using ARRAY_AGG
when selecting from the base table. In the case
of RESPECT NULLS
, which is set by default, output arrays cannot contain NULL
elements. An error is raised if an output array contains NULL
for an element.
For this materialized view:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS
SELECT
column_1, ARRAY_AGG(column_2 ORDER BY column_3 DESC LIMIT 4) column_arr
FROM `project.dataset.base_table`
GROUP BY column_1
This query will fail.
SELECT * FROM project-id.my_dataset.my_mv_table
The failure details:
query: Array cannot have a null element; error in writing field
To avoid this failure, add IGNORE NULLS
to ARRAY_AGG
:
ARRAY_AGG(col_2 ORDER BY col_3 DESC LIMIT 4 IGNORE NULLS) col
As another option, format the output array column with the
'%t'
or '%T'
specifier.
SELECT column_1, FORMAT('%T', column_arr) FROM project-id.my_dataset.my_mv_table
JOIN considerations
While joins are not currently supported, you may be able to leverage materialized views to reduce cost and latency of a query that does aggregation on top of join. For example, consider a case where you are joining a large fact table with few small-dimension tables and then aggregating on top of the join. It might be practical to rewrite the query to first perform the aggregation on top of the fact table with foreign keys as grouping keys, joining the result with the dimension table, and then finally performing a post-aggregation.
To demonstrate this approach, let's use query #52 from the TPC-DS benchmark.
Here's the original query:
SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ss_ext_sales_price) ext_price FROM project.tpcds_10T.date_dim dt, project.tpcds_10T.store_sales, project.tpcds_10T.item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy=12 and dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price DESC, brand_id LIMIT 100
Here's the alternative query:
WITH sales_summary AS ( SELECT ss_sold_date_sk, ss_item_sk, SUM(ss_ext_sales_price) AS ext_price FROM project.tpcds_10T.store_sales GROUP BY ss_sold_date_sk, ss_item_sk ) SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ext_price) AS ext_price FROM sales_summary, project.tpcds_10T.date_dim dt, project.tpcds_10T.item WHERE sales_summary.ss_sold_date_sk = dt.d_date_sk AND sales_summary.ss_item_sk = item.i_item_sk AND item.i_manager_id = 1 AND dt.d_moy=12 AND dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price desc, brand_id LIMIT 100
Here's how to create the view:
CREATE OR REPLACE MATERIALIZED VIEW project-id.tpcds_10T.sales_summary_mv AS SELECT ss_sold_date_sk, ss_item_sk, SUM(ss_ext_sales_price) AS ext_price FROM project-id.tpcds_10T.store_sales GROUP BY ss_sold_date_sk, ss_item_sk
Then, the following shows how to join the materialized view with the dimension tables:
SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ext_price) as ext_price FROM sales_summary_mv, project.tpcds_10T.date_dim dt, project.tpcds_10T.item WHERE sales_summary.ss_sold_date_sk = dt.d_date_sk AND sales_summary.ss_item_sk = item.i_item_sk AND item.i_manager_id = 1 AND dt.d_moy=12 AND dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price desc, brand_id LIMIT 100
Next steps
- Read the materialized views FAQ.
- Learn how to Create and use materialized views.
- Learn about BigQuery Materialized Views best practices.