Introduction to materialized views

This document provides an overview of BigQuery support for materialized views. Before you read this document, familiarize yourself with BigQuery and BigQuery's standard views.

Overview

In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.

Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base table. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.

Why use materialized views?

BigQuery Materialized Views can be beneficial in the following ways:

  • Reduction in the execution time and cost for queries with aggregate functions. The largest benefit is gained when a query's computation cost is high and the resulting dataset is small. If you have a raw table and you perform online analytical processing (OLAP) style aggregation that requires significant processing, and you have predictable and repeated queries, such as from extract, transform, load (ETL) or business intelligence (BI) pipelines, you should use materialized views.

  • Automatic and transparent BigQuery query optimization. The BigQuery optimizer uses a materialized view, if available, to improve the query execution plan. This optimization does not require any changes to the queries.

  • Aggregation of real-time data. If you want access to data for making decisions in real-time, you can use BigQuery Materialized Views with BigQuery streaming to perform aggregations in real-time, providing up-to-date information.

Materialized views preview features

In addition to generally available materialized views features, the following materialized views features are available in preview:

Required permissions

For information about the permissions that are required to perform operations on materialized views, see required permissions for working with materialized views.

Comparison to other BigQuery techniques

The following table summarizes the similarities and differences between BigQuery caching, scheduled queries, standard views, and materialized views.

Component Caching Scheduled queries Standard views Materialized views
Performance Yes No No Yes
Query supported All All All Aggregation — filtering and grouping
Partitioning and clustering No Yes N/A Yes
Incremental refresh No No No Yes
Additional storage No Yes No Yes
Query rewrite No No No Yes
Maintenance costs No Yes N/A Yes
Data staleness Never Yes Never Never
Limited to a single table No No No Yes

Interaction with other BigQuery features

The following BigQuery features work transparently with BigQuery Materialized Views:

  • Query plan explanation: The query plan reflects which materialized views will be scanned (if any), and shows how many bytes will be read from the materialized views and base table combined.

  • Query caching: The results of a query that BigQuery rewrites using a materialized view can be cached subject to the usual limitations (using of deterministic functions, no streaming into the base table, etc.).

  • Cost calculation: With on-demand billing, if a query is rewritten using one or more materialized views and as a result, scans fewer bytes than scanning the entire base table, then the query is less expensive.

  • Cost restriction: If you have set a value for maximum bytes billed, and a query would read a number of bytes beyond the limit, the query fails without incurring a charge, whether the query uses materialized views, the base table, or both.

  • Cost estimation using dry run: A dry run repeats query rewrite logic using the available materialized views and provides a cost estimate. You can use this feature as a way to test whether a specific query uses any materialized views.

Limitations

  • You cannot directly manipulate, or read raw content from, a materialized view. This means the following actions are not supported:

    • Copying a materialized view, either as a source or destination of a copy job.
    • Exporting a materialized view.
    • Loading data into a materialized view
    • Writing a query result into a materialized view
    • Running DML statements over a materialized view.
  • A materialized view must reside in the same organization as the base table, or in the same project if the project does not belong to an organization.

  • Each base table can be referenced by up to 20 materialized views from the same dataset, up to 100 materialized views from the same project, and up to 500 materialized views from the whole organization.

  • Only materialized views from the same dataset are considered for automatic query rewrite (or smart tuning).

  • Materialized views use a restricted SQL syntax and a limited set of aggregation functions. For more information, see Supported materialized views.

  • A materialized view can reference only a single table, and cannot use joins.

  • Materialized views cannot be nested on other materialized views.

  • You can use only the standard SQL dialect for materialized views.

  • SHOW DDL statements are not supported for materialized views.

  • ALTER DDL statements, other than ALTER with SET OPTIONS, are not supported for materialized views.

  • If you delete the base table without first deleting the materialized view, queries over the materialized view fail, as do refreshes. If you decide to recreate the base table, you must also recreate the materialized view.

  • After a materialized view is created on top of a partitioned base table, the table's partition expiration cannot be changed. To change the partition expiration of the base table, you need to first delete all materialized views that are based on the partitioned table.

  • There are some limitations on the use of ARRAY_AGG and JOIN statements in materialized views. For more information, see ARRAY_AGG considerations and JOIN considerations.

Materialized views pricing

Costs are associated with the following aspects of BigQuery Materialized Views:

  • Querying materialized views.
  • Maintaining materialized views, such as when materialized views are refreshed.
  • Storing materialized view tables.
Component On-demand pricing Flat-rate pricing
Querying Bytes processed by materialized views and any necessary portions of the base tables.1 Slots are consumed during query time.
Maintenance Bytes processed during refresh time. Slots are consumed during refresh time.
Storage Bytes stored in materialized views. Bytes stored in materialized views.

1If the base table is append-only, then only the delta since the last materialized view refresh is processed from the base table. For more information, see Incremental updates.

Storage cost details

For AVG, ARRAY_AGG, and APPROX_COUNT_DISTINCT aggregate values in a materialized view, the final value is not directly stored. Instead, BigQuery internally stores a materialized view as an intermediate sketch, which is used to produce the final value.

As an example, consider a materialized view that's created with the following command:

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS
SELECT date, AVG(net_paid) AS avg_paid
FROM project-id.my_dataset.my_base_table
GROUP BY date

While the avg_paid column is rendered as NUMERIC or FLOAT64 to the user, internally it is stored as BYTES, with its content being an intermediate sketch in proprietary format. For the purpose of data size calculation, the column is treated as BYTES.

View security

To control access to views in BigQuery, see Controlling access to views.

Next steps