BigQuery Materialized Views best practices

This page provides best practices for BigQuery Materialized Views to improve query performance.

When you create a materialized view

Ensure your materialized view definition reflects query patterns against the base table. Because there is a maximum of 20 materialized views per table, you should not create a materialized view for every permutation of a query. Instead, create materialized views to serve a broader set of queries.

For example, consider a query on a table where users often filter by columns user_id or department. You can group by these columns and optionally cluster by them, instead of adding filters like user_id = 123 into the materialized view.

As another example, users often use date filters, either by specific date (where order_date = current_date()) or date range (where order_date between '2019-10-01' and '2019-10-31'). As a good practice, add a date range filter in the materialized view that covers expected date ranges in the query:

CREATE MATERIALIZED VIEW  ...
WHERE date > '2019-01-01' GROUP BY date

If the base table is partitioned and its materialized view has significant size, then the materialized view should be partitioned as well. As a general guideline, a materialized view has significant size if it is about the same size as, or larger than, one partition of the base table.

When you maintain a materialized view

Monitor the cost of the refresh job, and adjust the automatic refresh interval if needed. Specifically, monitor total_bytes_processed and total_slot_ms.

For example, if the ingestion rate in the base table is relatively small, it makes sense to refresh the view less often. Otherwise, if the underlying data changes quickly, it makes sense to refresh more often.

If the base table ingests at predefined points in time, such as by a nightly extract, transform, and load (ETL) pipeline, consider taking full ownership of the materialized view maintenance schedule. To do so:

  1. Disable automatic refresh.

  2. Perform a manual refresh, either as part of the ETL pipeline, or configure a scheduled query at specific times of the day.