Materialized views FAQ

This document provides answers to frequently asked questions about BigQuery Materialized Views. Before you read this document, familiarize yourself with Introduction to materialized views, Creating and using materialized views, and Materialized views best practices.

When should I use scheduled queries versus materialized views?

Scheduled queries are a convenient way to run arbitrarily complex calculations periodically. Each time the query runs, it is being run fully. The previous results are not used, and you pay the full price for the query. Scheduled queries are great when you don't need the freshest data and you have a high tolerance for data staleness.

Materialized views are suited for when you need to query the latest data while cutting down latency and cost by reusing the previously computed result. You can use materialized views as pseudo-indexes, accelerating queries to the base table without updating any existing workflows.

As a general guideline, whenever possible and if you are not running arbitrarily complex calculations, use materialized views.

How fast is a change in the base table reflected when querying the materialized view?

Immediately. Whether you stream into the base table, bulk load data into it, or run a query that writes the results into the base table, the new table state is reflected immediately when you perform a query, even if the materialized view hasn't been refreshed yet. This applies when you query a materialized view over the base table, and also when you query the base table and the execution plan leverages the materialized view.

How frequently is a materialized view refreshed?

Each materialized view is automatically refreshed (recomputed) when the base table changes, but not more often than a certain interval. The refresh interval has a default value of 30 minutes, and can be adjusted for each view using the API or DDL. In addition, users can also force a manual refresh. For details about refreshes, see Refreshing materialized views.

How can I tell if a query was rewritten by the optimizer to utilize a materialized view?

Inspect the query plan. If a query was rewritten by BigQuery to leverage a materialized view, then the query plan will contain a READ my_mv_table step, where my_mv_table is the name of the materialized view used by the query rewrite.

How can I manage materialized views through the Cloud Console?

Materialized views are listed on the Dataset page within the Cloud Console. As an example, the following shows how a materialized view's schema appears:

Materialized view schema in Google Cloud Console

How can I disable query rewrite for a particular query?

The BigQuery query optimizer rewrites your query automatically in order to leverage a materialized view. You might want to disable query rewrite, for example, in order to compare results or performance between a query with and without a materialized view.

To disable query rewrite, add a "WHERE RAND()<1" filter to your query over the base table. RAND() is a non-deterministic function, so BigQuery does not rewrite the query. Because RAND()<1 always evaluates to true, it doesn't affect your query result.

Does a query trigger an automatic refresh of the materialized view?

No, a query doesn't automatically trigger a materialized view refresh.

Why am I getting these errors?

The following lists details on common errors.

Error: "Materialized view query contains unsupported feature"

This error can occur during when you create a materialized view. Some causes are:

  • The query contains a HAVING clause:

    -- Results in an error
    SELECT date, COUNT(event) count FROM base_table GROUP BY date HAVING count >= 100
    
  • The query contains a computation (in bold) on top of an aggregation function:

    -- Results in an error
    SELECT date
    , ROUND(SUM(net_paid)) as sum_paid
    , ARRAY_AGG(order_id ORDER BY net_paid DESC LIMIT 1)[OFFSET(0)] top_order
    , APPROX_COUNT_DISTINCT(customer_id) * 10 approx_customers
    FROM base_table GROUP BY date
    

Materialized views must contain only bare aggregation functions and must not use the HAVING clause. Any post-filtering and post-computation must be put on top of a materialized view. For example:

CREATE MATERIALIZED VIEW my_dataset.my_mv AS
SELECT date
  , COUNT(event) count
  , SUM(net_paid) sum_paid
  , ARRAY_AGG(order_id ORDER BY net_paid DESC LIMIT 1) top_order
  , APPROX_COUNT_DISTINCT(customer_id) approx_customers
FROM my_project.my_dataset.my_base_table
GROUP BY date

CREATE VIEW my_dataset.my_view AS
SELECT date
  , count
  , ROUND(sum_paid) sum_paid
  , top_order[OFFSET(0)] top_order
  , approx_customers * 10 approx_customers
FROM my_project.my_dataset.my_mv
WHERE count >= 100

-- Now you can query the regular view
SELECT ... FROM my_dataset.my_view

Some queries over materialized views are slower than same queries over manually materialized tables. Why is that?

In general, a query over a materialized view isn't always as performant as a query over the equivalent materialized table. The reason is that a materialized view guarantees to always return a fresh result, and it has to account for changes in the base table that were added since the last view refresh.

Consider this scenario:

CREATE MATERIALIZED VIEW my_dataset.my_mv AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

CREATE TABLE my_dataset.my_materialized_table AS
SELECT date, customer_id, region, SUM(net_paid) as total_paid
FROM my_dataset.sales
GROUP BY 1, 2, 3;

For example, this query:

  SELECT * FROM my_dataset.my_mv LIMIT 10
typically runs much more slowly than this query:
  SELECT * FROM my_dataset.my_materialized_table LIMIT 10
In order to provide consistently up-to-date results, BigQuery must query new rows in the base table and merge them into the materialized view before applying the 'LIMIT 10' predicate. As a result, slowness remains, even if the materialized view is fully up-to-date.

On the other hand, aggregations over materialized views are typically as fast as queries against the materialized table. For example, the following:

  SELECT SUM(total_paid) FROM my_dataset.my_mv WHERE date > '2020-12-01'
Should be as fast as this:
  SELECT SUM(total_paid) FROM my_dataset.my_materialized_table WHERE date > '2020-12-01'

JOIN and UNION ALL

Sometimes the BigQuery query optimizer doesn't leverage the data stored in a materialized view, in which case the materialized view behaves like a standard view. This can happen when a base table scan appears multiple times in a query plan with different materialized views, or without a materialized view. Consider this scenario:

CREATE MATERIALIZED VIEW dataset.mv1 AS
SELECT dt, SUM(x) sum_x
FROM dataset.table
GROUP BY 1;

CREATE MATERIALIZED VIEW dataset.mv2 AS
SELECT dt, COUNT(x) cnt_x
FROM dataset.table
GROUP BY 1;

CREATE MATERIALIZED VIEW dataset.mv3 AS
SELECT dt, COUNT(x) cnt_x
FROM dataset.table2
GROUP BY 1;

SELECT * FROM dataset.mv1 m1
INNER JOIN dataset.mv2 m2
ON m1.dt = m2.dt

This query falls back to a base table scan, because we have a JOIN clause with two aggregate fragments over the same table, with each fragment using a different materialized view.

The following query also falls back to a base table scan, because we have a UNION ALL clause with two aggregate fragments over the same table, and only one of them is using a materialized view:

SELECT sum_x as result, "sum" as what FROM dataset.mv1
UNION ALL
SELECT AVG(x) as result, "avg" as what FROM dataset.table1

If we replace mv2 with mv3 in the JOIN example above, both materialized views are leveraged, because they have different base tables. Similarly, if the second input of the UNION ALL clause in the case above were table2, then mv1 would also be used.

In practice, this means that a single query should reference only one materialized view for a given base table, and should not reference a materialized view and its base table at the same time.