Use materialized views

This document provides additional information about materialized views and how to use them. Before you read this document, familiarize yourself with Introduction to materialized views and Create materialized views.

Query materialized views

You can query your materialized views directly, the same way you query a regular table or standard view. Queries against materialized views are always consistent with queries against the view's base tables, even if those tables have changed since the last time the materialized view was refreshed. Querying does not automatically trigger a materialized refresh.

Required roles

To get the permissions that you need to query a materialized view, ask your administrator to grant you the BigQuery Data Viewer (roles/bigquery.dataViewer) IAM role on the base table of the materialized view and the materialized view itself. For more information about granting roles, see Manage access.

This predefined role contains the permissions required to query a materialized view. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to query a materialized view:

  • bigquery.tables.get
  • bigquery.tables.getData

You might also be able to get these permissions with custom roles or other predefined roles.

These permissions are required for queries in order to benefit from smart tuning.

For more information about IAM roles in BigQuery, see Introduction to IAM.

Incremental updates

BigQuery combines the cached view's data with new data to provide consistent query results while still using the materialized view. For single-table materialized views, this is possible if the base table is unchanged since the last refresh, or if only new data was added. For JOIN views, only tables on the left side of the JOIN can have appended data. If one of the tables on the right side of a JOIN has changed, then the view cannot be incrementally updated.

If the base table had updates or deletions since the last refresh, or if the materialized view's base tables on the right side of the JOIN have changed, then BigQuery automatically reverts to the original query. For more information about joins and materialized views, see Joins. The following are examples of Google Cloud console, bq command-line tool, and API actions that can cause an update or deletion:

  • Data manipulation language (DML) UPDATE, MERGE, or DELETE statements
  • Truncation
  • Partition expiration

The following metadata operations also prevent a materialized view from being incrementally updated:

  • Changing partition expiration
  • Updating or dropping a column

If a materialized view cannot be incrementally updated, then its cached data is not used by queries until the view is automatically or manually refreshed. For details about why a job didn't use materialized view data, see Understand why materialized views were rejected.

Partition alignment

If a materialized view is partitioned, BigQuery ensures that its partitions are aligned with the partitions of the base table's partitioning column. Aligned means that the data from a particular partition of the base table contributes to the same partition of the materialized view. For example, a row from partition 20220101 of the base table would contribute only to partition 20220101 of the materialized view.

When a materialized view is partitioned, the behavior described in the Incremental updates occurs for each individual partition. For example, if data is deleted in one partition of the base table, then BigQuery can still use the materialized view's other partitions.

Materialized views with inner joins can only be aligned with one of their base tables. If one of the non-aligned base tables changes, it affects the entire view.

Smart tuning

BigQuery automatically rewrites queries to use materialized views whenever possible. Automatic rewriting improves query performance and cost, and does not change query results. Querying does not automatically trigger a materialized refresh. For a query to be rewritten, the materialized view must meet the following conditions:

  • Belong to the same dataset as one of its base tables.
  • Use the same set of base tables as the query.
  • Include all columns being read.
  • Include all rows being read.

Smart tuning examples

Consider the following materialized view query example:

SELECT
  store_id,
  CAST(sold_datetime AS DATE) AS sold_date
  SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
  CAST(sold_datetime AS DATE) >= '2021-01-01' AND
  promo_id IS NOT NULL
GROUP BY 1, 2

The following examples show queries and why those queries are or aren't automatically rewritten using this view:

Query Rewrite? Reason
SELECT
SUM(net_paid) AS sum_paid,
SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL
No The view must include all columns being read. The view does not include 'SUM(net_paid)'.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL
Yes
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL AND
customer_id = 12345
No The view must include all columns being read. The view does not include 'customer'.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
sold_datetime= '2021-01-01' AND
promo_id IS NOT NULL
No The view must include all columns being read. 'sold_datetime' is not an output (but 'CAST(sold_datetime AS DATE)' is).
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id IS NOT NULL AND
store_id = 12345
Yes
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2021-01-01' AND
promo_id = 12345
No The view must include all rows being read. 'promo_id' is not an output so the more restrictive filter can't be applied to the view.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE CAST(sold_datetime AS DATE) >= '2020-01-01'
No The view must include all rows being read. The view filter for dates in 2021 and after, but the query reads dates from 2020.
SELECT SUM(net_profit) AS sum_profit
FROM dataset.store_sales
WHERE
CAST(sold_datetime AS DATE) >= '2022-01-01' AND
promo_id IS NOT NULL
Yes

Understand whether a query was rewritten

To understand if a query was rewritten by smart tuning to use a materialized view, inspect the query plan. If the query was rewritten, then the query plan contains a READ my_materialized_view step, where my_materialized_view is the name of the materialized view used. To understand why a query didn't use a materialized view, see Understand why materialized views were rejected.

Understand why materialized views were rejected

Queries cannot use a materialized view for a variety of reasons. The steps to understand why a materialized view was rejected depend on the type of query that you used:

  • Direct query of the materialized view
  • Indirect query in which smart tuning might choose to use the materialized view

The following sections provide steps to help you understand why a materialized view was rejected.

Direct query of materialized views

Direct queries of materialized views might not use cached data in certain circumstances. The following steps can help you understand why the materialized view data was not used:

  1. Follow the steps in Monitor materialized view use and find the target materialized view in the materialized_view_statistics field for the query.
  2. If chosen is present in the statistics and its value is TRUE, then the materialized view is used by the query.
  3. Review the rejected_reason field to find next steps. In most cases, you can manually refresh the materialized view or wait for the next automatic refresh.

Query with smart tuning

  1. Follow the steps in Monitor materialized view use and find the target materialized view in the materialized_view_statistics for the query.
  2. Review the rejected_reason to find next steps. For example, if the rejected_reason value is COST, then smart tuning identified more efficient data sources for cost and performance.
  3. If the materialized view is not present, try a direct query of the materialized view and follow the steps in Direct query of materialized views.
  4. If the direct query does not use the materialized view, then the shape of the materialized view does not match the query. For more information about smart tuning and how queries are rewritten using materialized views, see Smart tuning examples.

Frequently asked questions

When should I use scheduled queries versus materialized views?

Scheduled queries are a convenient way to run arbitrarily complex calculations periodically. Each time the query runs, it is being run fully. The previous results are not used, and you pay the full price for the query. Scheduled queries are great when you don't need the freshest data and you have a high tolerance for data staleness.

Materialized views are suited for when you need to query the latest data while cutting down latency and cost by reusing the previously computed result. You can use materialized views as pseudo-indexes, accelerating queries to the base table without updating any existing workflows. The --max_staleness option lets you define acceptable staleness for materialized views, providing consistently high performance with controlled costs when processing large, frequently changing datasets.

As a general guideline, whenever possible and if you are not running arbitrarily complex calculations, use materialized views.

Some queries over materialized views are slower than same queries over manually materialized tables. Why is that?

In general, a query over a materialized view isn't always as performant as a query over the equivalent materialized table. The reason is that a materialized view guarantees to always return a fresh result, and it has to account for changes in the base tables that were added since the last view refresh.

Consider this scenario:

CREATE MATERIALIZED VIEW my_dataset.my_mv AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

CREATE TABLE my_dataset.my_materialized_table AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

For example, this query:

  SELECT * FROM my_dataset.my_mv LIMIT 10
typically runs much more slowly than this query:
  SELECT * FROM my_dataset.my_materialized_table LIMIT 10
In order to provide consistently up-to-date results, BigQuery must query new rows in the base table and merge them into the materialized view before applying the 'LIMIT 10' predicate. As a result, slowness remains, even if the materialized view is fully up-to-date.

On the other hand, aggregations over materialized views are typically as fast as queries against the materialized table. For example, the following:

  SELECT SUM(total_paid) FROM my_dataset.my_mv WHERE date > '2020-12-01'
Should be as fast as this:
  SELECT SUM(total_paid) FROM my_dataset.my_materialized_table WHERE date > '2020-12-01'