Stay organized with collections Save and categorize content based on your preferences.

Use materialized views

This document provides additional information about BigQuery Materialized Views and how to use them. Before you read this document, familiarize yourself with Introduction to materialized views and Create materialized views.

These following topics are available to help you understand how to use materialized views:

Query materialized views

You can query your materialized views directly, the same way you query a regular table or standard view. Queries against materialized views are always consistent with queries against the view's base tables, even if those tables have changed since the last time the materialized view was refreshed. Querying does not automatically trigger a materialized refresh.

Incremental updates

BigQuery combines the cached view data with new data to provide consistent query results while still using the materialized view. For single-table materialized views, this is possible when the base table has not changed since the last refresh, or only new data has been added. For multi-table views, no more than one table may have appends. All others cannot have changes.

If there were any updates or deletes in the base tables since the last refresh, or if more than one of a multi-table materialized view's base tables have changed, BigQuery automatically reverts to the original query. The following are examples of actions that cause an update or deletion:

  • DML UPDATE
  • DML MERGE
  • DML DELETE
  • truncation
  • partition expiration
  • console, bq command-line, and API equivalents of the preceding items in this list

The following metadata operations also cause BigQuery to revert to the original query:

  • changing partition expiration
  • updating or dropping a column

In this situation, the cached view data is not used until the new refresh.

Partition alignment

If a materialized view is partitioned, BigQuery ensures that its partitions are aligned with the partitions of the base table's partitioning column. Aligned means that the data from a particular partition of the base table contributes to the same partition of the materialized view. For example, a row from partition 20220101 of the base table would contribute only to partition 20220101 of the materialized view.

When a materialized view is partitioned, the behavior described in the Incremental updates occurs for each individual partition. For example, if data is deleted in one partition of the base table, then BigQuery can still use the materialized view's other partitions.

Materialized views with inner joins can only be aligned with one of their base tables. If one of the non-aligned base tables changes, it affects the entire view.

Smart-tuning

BigQuery automatically rewrites queries to use materialized views whenever possible. Automatic rewriting improves query performance and cost, and does not change query results. Querying does not automatically trigger a materialized refresh. For a query to be rewritten, the materialized view must meet the following conditions:

  • Belong to the same dataset as one of its base tables.
  • Use the same set of base tables as the query.
  • Include all columns being read.
  • Include all rows being read.

Smart-tuning examples

Consider the following materialized view query example:

SELECT
  store_id,
  CAST(sold_datetime AS DATE) AS sold_date
  SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
  CAST(sold_datetime AS DATE) >= '2021-01-01' AND
  promo_id IS NOT NULL
GROUP BY 1, 2

The following examples show queries and why those queries are or aren't automatically rewritten using this view:

Query Rewrite? Reason
SELECT
SUM(net_paid) AS sum_paid,
SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL
No The view must include all columns being read. The view does not include 'SUM(net_paid)'.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL
Yes
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL AND
customer_id = 12345
No The view must include all columns being read. The view does not include 'customer'.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
sold_datetime= '2021-01-01' AND
promo_id IS NOT NULL
No The view must include all columns being read. 'sold_datetime' is not an output (but 'CAST(sold_datetime AS DATE)' is).
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL AND
store_id = 12345
Yes
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id = 12345
No The view must include all rows being read. 'promo_id' is not an output so the more restrictive filter can't be applied to the view.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE CAST(sold_datetime AS DATE) >= '2020-01-01'
No The view must include all rows being read. The view filter for dates in 2021 and after, but the query reads dates from 2020.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2022-01-01' AND
promo_id IS NOT NULL
Yes

How to tell if a query was rewritten

To see if a query was rewritten by the optimizer to utilize a materialized view, inspect the query plan. If the query was rewritten, then the query plan contains a READ my_materialized_view step, where my_materialized_view is the name of the materialized view used.

Frequently asked questions

When should I use scheduled queries versus materialized views?

Scheduled queries are a convenient way to run arbitrarily complex calculations periodically. Each time the query runs, it is being run fully. The previous results are not used, and you pay the full price for the query. Scheduled queries are great when you don't need the freshest data and you have a high tolerance for data staleness.

Materialized views are suited for when you need to query the latest data while cutting down latency and cost by reusing the previously computed result. You can use materialized views as pseudo-indexes, accelerating queries to the base table without updating any existing workflows.

As a general guideline, whenever possible and if you are not running arbitrarily complex calculations, use materialized views.

Some queries over materialized views are slower than same queries over manually materialized tables. Why is that?

In general, a query over a materialized view isn't always as performant as a query over the equivalent materialized table. The reason is that a materialized view guarantees to always return a fresh result, and it has to account for changes in the base tables that were added since the last view refresh.

Consider this scenario:

CREATE MATERIALIZED VIEW my_dataset.my_mv AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

CREATE TABLE my_dataset.my_materialized_table AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

For example, this query:

  SELECT * FROM my_dataset.my_mv LIMIT 10
typically runs much more slowly than this query:
  SELECT * FROM my_dataset.my_materialized_table LIMIT 10
In order to provide consistently up-to-date results, BigQuery must query new rows in the base table and merge them into the materialized view before applying the 'LIMIT 10' predicate. As a result, slowness remains, even if the materialized view is fully up-to-date.

On the other hand, aggregations over materialized views are typically as fast as queries against the materialized table. For example, the following:

  SELECT SUM(total_paid) FROM my_dataset.my_mv WHERE date > '2020-12-01'
Should be as fast as this:
  SELECT SUM(total_paid) FROM my_dataset.my_materialized_table WHERE date > '2020-12-01'