This page provides an overview of BigQuery Materialized View.
BigQuery Materialized View are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.
The following are key characteristics of BigQuery Materialized View:
Zero maintenance: A materialized view is recomputed in background once the base table has changed. All incremental data changes from the base tables are automatically added to the materialized views. No user inputs are required.
Always fresh: A materialized view is always consistent with the base table, including BigQuery streaming tables. If a base table is modified via update, merge, partition truncation, or partition expiration, BigQuery will invalidate the impacted portions of the materialized view and fully re-read the corresponding portion of the base table. For an unpartitioned materialized view, BigQuery will invalidate the entire materialized view and re-read the entire base table. For a partitioned materialized view, BigQuery will invalidate the affected partitions of the materialized view and re-read the entire corresponding partitions from the base table. Partitions that are append-only are not invalidated and are read in delta mode. In other words, there will never be a situation when querying a materialized view results in stale data.
Smart tuning: If a query or part of a query against the source table can instead be resolved by querying the materialized view, BigQuery will rewrite (reroute) the query to use the materialized view for better performance and/or efficiency.
BigQuery Materialized View provides the following benefits:
Reduction in the execution time and cost for queries with aggregate functions. The largest benefit is gained when a query's computation cost is high and the resulting data set is small.
Automatic and transparent BigQuery optimization because the optimizer uses a materialized view, if available, to improve the query execution plan. This optimization does not require any changes to the queries.
The same resilience and high availability as BigQuery tables.
BigQuery Materialized View targets the following use cases:
Faster query performance: If you have a raw table and you perform online analytical processing (OLAP) style aggregation that requires significant processing, and you have predictable and repeated queries, such as from extract, transform, load (ETL) or business intelligence (BI) pipelines, you should use BigQuery Materialized View.
Aggregation of real-time data: If you want access to data for decisions in real-time, BigQuery natively supports streaming capabilities. BigQuery Materialized View integrates with BigQuery streaming and performs aggregations in real-time to provide up-to-date information.
Creating materialized views
You can create a materialized view through the BigQuery web UI, the
tool, or the BigQuery API. As an example, in the BigQuery web UI you can use the
following DDL statement:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1
For more details on how to use the BigQuery web UI, the
bq command-line tool, or the
BigQuery API to create a materialized view, see
Create and use materialized views.
Costs are associated with BigQuery Materialized View for the following components:
- Querying materialized views
- Maintaining materialized views, such as when the view is refreshed
- Storing materialized view tables
|Component||On-demand pricing||Flat-rate pricing|
|Querying||Bytes processed by the materialized view and any necessary portion of the base table. 1||No extra cost. Slots are consumed during query time.|
|Maintenance||Bytes processed. Slots are consumed for the incremental view refresh.||No extra cost. Slots are consumed during query time.|
|Storage||Bytes stored in a materialized view||Bytes stored in a materialized view|
1 If the base table is append-only, then only the delta since the last view refresh will be processed from the base table. For more information, see Incremental updates.
Storage cost details
APPROX_COUNT_DISTINCT aggregate values in a
materialized view, the final value is not directly stored. Instead,
BigQuery internally stores a materialized view as an intermediate
sketch used to produce the final value.
As an example, consider a materialized view created with the following command:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT date, AVG(net_paid) AS avg_paid FROM project-id.my_dataset.my_base_table GROUP BY date
avg_paid column will be rendered as
FLOAT64 to the user,
internally it will be stored as
BYTES, with its content being an intermediate
sketch in proprietary format. For the purpose of data size calculation, the
column will be treated as
Access granted to a materialized view is set at the dataset level. If the user has access to the dataset, the user has access to the materialized views in the dataset.
Monitoring materialized views
You can determine information about materialized views and materialized view refresh jobs by using the BigQuery API.
For discovery of materialized view refresh jobs, list them by calling the
jobs.list method. To retrieve details about the jobs, call the
jobs.get method. Automatic refresh jobs contain the
materialized_view_refresh prefix within the job ID, and they are started
by a BigQuery admin account.
Comparison to other BigQuery techniques
The following table summarizes the similarities and difference between BigQuery caching, scheduled queries, standard views, and materialized views.
|Component||Caching||Scheduled queries||Standard views||Materialized views|
|Query supported||All||All||All||Aggregation — filtering and grouping|
|Partitioning and clustering||No||Yes||NA||Yes|
|Limited to a single table||No||No||No||Yes|
Interaction with other BigQuery features
The following BigQuery features work transparently with materialized views:
Query plan explanation: The query plan will reflect which materialized views were scanned (if any), and will show how many bytes were read from the materialized view and base table combined.
Query caching: Results of a query that gets rewritten using a materialized view can be cached subject to the usual limitations (using of deterministic functions, no streaming into the base table, etc.).
Cost calculation: With on-demand billing, if the query was rewritten using materialized view(s) and ended up scanning less bytes than scanning the base table, the query will end up being cheaper.
Cost restriction: If you have set a value for maximum bytes billed, and a query will read bytes beyond the limit, the query fails without incurring a charge, whether the query used the materialized view, the base table, or both.
Cost estimation via dry run: A dry run repeats query rewrite logic using the available materialized views and should provide an accurate cost estimate. You can use it as a way to test whether a specific query utilizes any materialized views.
You cannot directly manipulate, or read raw content from, a materialized view. This means the following actions are not supported:
- Copying a materialized view, either as a source of destination of a copy job.
- Exporting a materialized view.
- Loading data into a materialized view
- Writing a query result into a materialized view
- Running DML statements over a materialized view.
A materialized view is required to be in the same dataset as the base table.
A materialized view is limited to referencing only a single table, and is not able to use joins or
Materialized views cannot be nested on other materialized views.
You can use only the standard SQL dialect for materialized views.
You can create a maximum of 20 materialized views per base table.
SHOWDDL statements are not supported on materialized views
ALTERDDL statements, other than
SET OPTIONS, are not supported on materialized views
If you delete the base table without first deleting the materialized view, queries over the materialized view will fail, as will refreshes. If you decide to re-create the base table, you will also need to recreate the materialized view.
Protecting data using Cloud Key Management Service keys is not supported. You can't protect the data in a materialized view using Cloud KMS keys, nor can you create a materialized view over a base table protected by Cloud KMS keys.
ARRAY_AGG columns from a materialized view carry the same
limitations as using
ARRAY_AGG when selecting from the base table. In the case
RESPECT NULLS, which is set by default, output arrays cannot contain
elements. An error is raised if an output array contains
NULL for an element.
For this materialized view:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT column_1, ARRAY_AGG(column_2 ORDER BY column_3 DESC LIMIT 4) column_arr FROM `project.dataset.base_table` GROUP BY column_1
This query will fail.
SELECT * FROM project-id.my_dataset.my_mv_table
The failure details:
query: Array cannot have a null element; error in writing field
To avoid this failure, add
IGNORE NULLS to
ARRAY_AGG(col_2 ORDER BY col_3 DESC LIMIT 4 IGNORE NULLS) col
As another option, format the output array column with the
SELECT column_1, FORMAT('%T', column_arr) FROM project-id.my_dataset.my_mv_table
While joins are not currently supported, you may be able to leverage materialized views to reduce cost and latency of a query that does aggregation on top of join. For example, consider a case where you are joining a large fact table with few small-dimension tables and then aggregating on top of the join. It might be practical to rewrite the query to first perform the aggregation on top of the fact table with foreign keys as grouping keys, joining the result with the dimension table, and then finally performing a post-aggregation.
To demonstrate this approach, let's use query #52 from the TPC-DS benchmark.
Here's the original query:
SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ss_ext_sales_price) ext_price FROM project.tpcds_10T.date_dim dt, project.tpcds_10T.store_sales, project.tpcds_10T.item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy=12 and dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price DESC, brand_id LIMIT 100
Here's the alternative query:
WITH sales_summary AS ( SELECT ss_sold_date_sk, ss_item_sk, SUM(ss_ext_sales_price) AS ext_price FROM project.tpcds_10T.store_sales GROUP BY ss_sold_date_sk, ss_item_sk ) SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ext_price) AS ext_price FROM sales_summary, project.tpcds_10T.date_dim dt, project.tpcds_10T.item WHERE sales_summary.ss_sold_date_sk = dt.d_date_sk AND sales_summary.ss_item_sk = item.i_item_sk AND item.i_manager_id = 1 AND dt.d_moy=12 AND dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price desc, brand_id LIMIT 100
Here's how to create the view:
CREATE OR REPLACE MATERIALIZED VIEW project-id.tpcds_10T.sales_summary_mv AS SELECT ss_sold_date_sk, ss_item_sk, SUM(ss_ext_sales_price) AS ext_price FROM project-id.tpcds_10T.store_sales GROUP BY ss_sold_date_sk, ss_item_sk
Then, the following shows how to join the materialized view with the dimension tables:
SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, SUM(ext_price) as ext_price FROM sales_summary_mv, project.tpcds_10T.date_dim dt, project.tpcds_10T.item WHERE sales_summary.ss_sold_date_sk = dt.d_date_sk AND sales_summary.ss_item_sk = item.i_item_sk AND item.i_manager_id = 1 AND dt.d_moy=12 AND dt.d_year=1998 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, ext_price desc, brand_id LIMIT 100