Use materialized views
This document provides additional information about materialized views and how to use them. Before you read this document, familiarize yourself with Introduction to materialized views and Create materialized views.
Query materialized views
You can query your materialized views directly, the same way you query a regular table or standard view. Queries against materialized views are always consistent with queries against the view's base tables, even if those tables have changed since the last time the materialized view was refreshed. Querying does not automatically trigger a materialized refresh.
Required roles
To get the permissions that you need to query a materialized view,
ask your administrator to grant you the
BigQuery Data Viewer (roles/bigquery.dataViewer
) IAM role on the base table of the materialized view and the materialized view itself.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to query a materialized view. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to query a materialized view:
-
bigquery.tables.get
-
bigquery.tables.getData
You might also be able to get these permissions with custom roles or other predefined roles.
These permissions are required for queries in order to benefit from smart tuning.
For more information about IAM roles in BigQuery, see Introduction to IAM.
Incremental updates
BigQuery combines the cached view's data with new data to provide
consistent query results while still using the materialized view. For
single-table materialized views, this is possible if the base table is unchanged
since the last refresh, or if only new data was added. For JOIN
views, only
tables on the left side of the JOIN
can have appended data. If one of the
tables on the right side of a JOIN
has changed, then the view cannot be
incrementally updated.
If the base table had updates or deletions since the last refresh, or if the
materialized view's base tables on the right side of the JOIN
have changed,
then BigQuery automatically reverts to the original query. For
more information about joins and materialized views, see
Joins. The
following are examples of Google Cloud console, bq command-line tool, and API actions that can
cause an update or deletion:
- Data manipulation language (DML)
UPDATE
,MERGE
, orDELETE
statements - Truncation
- Partition expiration
The following metadata operations also prevent a materialized view from being incrementally updated:
- Changing partition expiration
- Updating or dropping a column
If a materialized view cannot be incrementally updated, then its cached data is not used by queries until the view is automatically or manually refreshed. For details about why a job didn't use materialized view data, see Understand why materialized views were rejected. Additionally, materialized views cannot be incrementally updated if its base table has accumulated unprocessed changes for a time period greater than the table's time travel interval.
Partition alignment
If a materialized view is partitioned, BigQuery ensures that its
partitions are aligned with the partitions of the base table's partitioning
column. Aligned means that the data from a particular partition of the
base table contributes to the same partition of the materialized view. For
example, a row from partition 20220101
of the base table would contribute only
to partition 20220101
of the materialized view.
When a materialized view is partitioned, the behavior described in the Incremental updates occurs for each individual partition. For example, if data is deleted in one partition of the base table, then BigQuery can still use the materialized view's other partitions.
Materialized views with inner joins can only be aligned with one of their base tables. If one of the non-aligned base tables changes, it affects the entire view.
Smart tuning
BigQuery automatically rewrites queries to use materialized views whenever possible. Automatic rewriting improves query performance and cost, and does not change query results. Querying does not automatically trigger a materialized refresh. For a query to be rewritten, the materialized view must meet the following conditions:
- Belong to the same dataset as one of its base tables.
- Use the same set of base tables as the query.
- Include all columns being read.
- Include all rows being read.
Smart tuning examples
Consider the following materialized view query example:
SELECT store_id, CAST(sold_datetime AS DATE) AS sold_date SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id IS NOT NULL GROUP BY 1, 2
The following examples show queries and why those queries are or aren't automatically rewritten using this view:
Query | Rewrite? | Reason |
---|---|---|
SELECT SUM(net_paid) AS sum_paid, SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id IS NOT NULL |
No | The view must include all columns being read. The view does not include 'SUM(net_paid)'. |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id IS NOT NULL |
Yes | |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id IS NOT NULL AND customer_id = 12345 |
No | The view must include all columns being read. The view does not include 'customer'. |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE sold_datetime= '2021-01-01' AND promo_id IS NOT NULL |
No | The view must include all columns being read. 'sold_datetime' is not an output (but 'CAST(sold_datetime AS DATE)' is). |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id IS NOT NULL AND store_id = 12345 |
Yes | |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2021-01-01' AND promo_id = 12345 |
No | The view must include all rows being read. 'promo_id' is not an output so the more restrictive filter can't be applied to the view. |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2020-01-01' |
No | The view must include all rows being read. The view filter for dates in 2021 and after, but the query reads dates from 2020. |
SELECT SUM(net_profit) AS sum_profit FROM dataset.store_sales WHERE CAST(sold_datetime AS DATE) >= '2022-01-01' AND promo_id IS NOT NULL |
Yes |
Understand whether a query was rewritten
To understand if a query was rewritten by smart tuning to use a materialized
view, inspect the query plan. If the
query was rewritten, then the query plan contains a READ my_materialized_view
step, where my_materialized_view
is the name of the materialized view used. To
understand why a query didn't use a materialized view, see Understand why
materialized views were rejected.
Understand why materialized views were rejected
If you have disabled automatic refresh for your materialized view and the table has unprocessed changes, then the query might be faster for several days but then start revert to the original query resulting in slower processing speed. To benefit from materialized views, enable automatic refresh or manually refresh regularly, and monitor materialized view refresh jobs to confirm they succeed.
The steps to understand why a materialized view was rejected depend on the type of query that you used:
- Direct query of the materialized view
- Indirect query in which smart tuning might choose to use the materialized view
The following sections provide steps to help you understand why a materialized view was rejected.
Direct query of materialized views
Direct queries of materialized views might not use cached data in certain circumstances. The following steps can help you understand why the materialized view data was not used:
- Follow the steps in Monitor materialized view
use
and find the target materialized view in the
materialized_view_statistics
field for the query. - If
chosen
is present in the statistics and its value isTRUE
, then the materialized view is used by the query. - Review the
rejected_reason
field to find next steps. In most cases, you can manually refresh the materialized view or wait for the next automatic refresh.
Query with smart tuning
- Follow the steps in Monitor materialized view
use
and find the target materialized view in the
materialized_view_statistics
for the query. - Review the
rejected_reason
to find next steps. For example, if therejected_reason
value isCOST
, then smart tuning identified more efficient data sources for cost and performance. - If the materialized view is not present, try a direct query of the materialized view and follow the steps in Direct query of materialized views.
- If the direct query does not use the materialized view, then the shape of the materialized view does not match the query. For more information about smart tuning and how queries are rewritten using materialized views, see Smart tuning examples.
Frequently asked questions
When should I use scheduled queries versus materialized views?
Scheduled queries are a convenient way to run arbitrarily complex calculations periodically. Each time the query runs, it runs fully, with no benefit from previous results, and you pay the full compute cost for the query. Scheduled queries are ideal when you don't need the freshest data and have a high tolerance for data staleness.
Materialized views are best suited for when you need to query the latest data
with minimized latency and cost by reusing the previously computed result.
You can use materialized views as pseudo-indexes, accelerating queries to the
base table without updating any existing workflows. The --max_staleness
option
lets you define acceptable staleness for your materialized views, providing
consistently high performance with controlled costs when processing
large, frequently changing datasets.
As a general guideline, whenever possible and if you are not running arbitrarily complex calculations, use materialized views.
Some queries over materialized views are slower than same queries over manually materialized tables. Why is that?
In general, a query over a materialized view isn't always as performant as a query over the equivalent materialized table. The reason is that materialized views always return fresh results, and must account for changes to their base tables since the last view refresh.
Consider this scenario:
CREATE MATERIALIZED VIEW my_dataset.my_mv AS SELECT date, customer_id, region, SUM(net_paid) as total_paid FROM my_dataset.sales GROUP BY 1, 2, 3; CREATE TABLE my_dataset.my_materialized_table AS SELECT date, customer_id, region, SUM(net_paid) as total_paid FROM my_dataset.sales GROUP BY 1, 2, 3;
For example, this query:
SELECT * FROM my_dataset.my_mv LIMIT 10
SELECT * FROM my_dataset.my_materialized_table LIMIT 10
On the other hand, aggregations over materialized views are typically as fast as queries against the materialized table. For example, the following:
SELECT SUM(total_paid) FROM my_dataset.my_mv WHERE date > '2020-12-01'
SELECT SUM(total_paid) FROM my_dataset.my_materialized_table WHERE date > '2020-12-01'