This document provides answers to frequently asked questions about BigQuery Materialized Views. Before you read this document, familiarize yourself with Introduction to materialized views, Creating and using materialized views, and Materialized views best practices.
When should I use scheduled queries versus materialized views?
Scheduled queries are a convenient way to run arbitrarily complex calculations periodically. Each time the query runs, it is being run fully. The previous results are not used, and you pay the full price for the query. Scheduled queries are great when you don't need the freshest data and you have a high tolerance for data staleness.
Materialized views are suited for when you need to query the latest data while cutting down latency and cost by reusing the previously computed result. You can use materialized views as pseudo-indexes, accelerating queries to the base table without updating any existing workflows.
As a general guideline, whenever possible and if you are not running arbitrarily complex calculations, use materialized views.
How fast is a change in the base table reflected when querying the materialized view?
Immediately. Whether you stream into the base table, bulk load data into it, or run a query that writes the results into the base table, the new table state is reflected immediately when you perform a query, even if the materialized view hasn't been refreshed yet. This applies when you query a materialized view over the base table, and also when you query the base table and the execution plan leverages the materialized view.
How frequently is a materialized view refreshed?
Each materialized view is automatically refreshed (recomputed) when the base table changes, but not more often than a certain interval. The refresh interval has a default value of 30 minutes, and can be adjusted for each view using the API or DDL. In addition, users can also force a manual refresh. For details about refreshes, see Refreshing materialized views.
How can I tell if a query was rewritten by the optimizer to utilize a materialized view?
Inspect the query plan. If a query was rewritten by BigQuery
to leverage a materialized view, then the query plan will contain a
my_mv_table step, where
my_mv_table is the name of the materialized view used
by the query rewrite.
How can I manage materialized views through the Cloud Console?
Materialized views are listed on the Dataset page within the Cloud Console. As an example, the following shows how a materialized view's schema appears:
How can I disable query rewrite for a particular query?
The BigQuery query optimizer rewrites your query automatically in order to leverage a materialized view. You might want to disable query rewrite, for example, in order to compare results or performance between a query with and without a materialized view.
To disable query rewrite, add a
"WHERE RAND()<1" filter to your query over the
RAND() is a non-deterministic function, so BigQuery
does not rewrite the query. Because
RAND()<1 always evaluates to true, it
doesn't affect your query result.
Does a query trigger an automatic refresh of the materialized view?
No, a query doesn't automatically trigger a materialized view refresh.
Why am I getting these errors?
The following lists details on common errors.
Error: "Materialized view query contains unsupported feature"
This error can occur during when you create a materialized view. Some causes are:
The query contains a
-- Results in an error SELECT date, COUNT(event) count FROM base_table GROUP BY date HAVING count >= 100
The query contains a computation (in bold) on top of an aggregation function:
-- Results in an error SELECT date , ROUND(SUM(net_paid)) as sum_paid , ARRAY_AGG(order_id ORDER BY net_paid DESC LIMIT 1)[OFFSET(0)] top_order , APPROX_COUNT_DISTINCT(customer_id) * 10 approx_customers FROM base_table GROUP BY date
Materialized views must contain only bare aggregation functions and must not use
HAVING clause. Any post-filtering and post-computation must be put on top
of a materialized view. For example:
CREATE MATERIALIZED VIEW my_dataset.my_mv AS SELECT date , COUNT(event) count , SUM(net_paid) sum_paid , ARRAY_AGG(order_id ORDER BY net_paid DESC LIMIT 1) top_order , APPROX_COUNT_DISTINCT(customer_id) approx_customers FROM my_project.my_dataset.my_base_table GROUP BY date CREATE VIEW my_dataset.my_view AS SELECT date , count , ROUND(sum_paid) sum_paid , top_order[OFFSET(0)] top_order , approx_customers * 10 approx_customers FROM my_project.my_dataset.my_mv WHERE count >= 100 -- Now you can query the regular view SELECT ... FROM my_dataset.my_view
Some queries over materialized views are slower than same queries over manually materialized tables. Why is that?
In general, a query over a materialized view isn't always as performant as a query over the equivalent materialized table. The reason is that a materialized view guarantees to always return a fresh result, and it has to account for changes in the base table that were added since the last view refresh.
Consider this scenario:
CREATE MATERIALIZED VIEW my_dataset.my_mv AS SELECT date, customer_id, region, SUM(net_paid) as total_paid FROM my_dataset.sales GROUP BY 1, 2, 3; CREATE TABLE my_dataset.my_materialized_table AS SELECT date, customer_id, region, SUM(net_paid) as total_paid FROM my_dataset.sales GROUP BY 1, 2, 3;
For example, this query:
SELECT * FROM my_dataset.my_mv LIMIT 10typically runs much more slowly than this query:
SELECT * FROM my_dataset.my_materialized_table LIMIT 10In order to provide consistently up-to-date results, BigQuery must query new rows in the base table and merge them into the materialized view before applying the 'LIMIT 10' predicate. As a result, slowness remains, even if the materialized view is fully up-to-date.
On the other hand, aggregations over materialized views are typically as fast as queries against the materialized table. For example, the following:
SELECT SUM(total_paid) FROM my_dataset.my_mv WHERE date > '2020-12-01'Should be as fast as this:
SELECT SUM(total_paid) FROM my_dataset.my_materialized_table WHERE date > '2020-12-01'
JOIN and UNION ALL
Sometimes the BigQuery query optimizer doesn't leverage the data stored in a materialized view, in which case the materialized view behaves like a standard view. This can happen when a base table scan appears multiple times in a query plan with different materialized views, or without a materialized view. Consider this scenario:
CREATE MATERIALIZED VIEW dataset.mv1 AS SELECT dt, SUM(x) sum_x FROM dataset.table GROUP BY 1; CREATE MATERIALIZED VIEW dataset.mv2 AS SELECT dt, COUNT(x) cnt_x FROM dataset.table GROUP BY 1; CREATE MATERIALIZED VIEW dataset.mv3 AS SELECT dt, COUNT(x) cnt_x FROM dataset.table2 GROUP BY 1; SELECT * FROM dataset.mv1 m1 INNER JOIN dataset.mv2 m2 ON m1.dt = m2.dt
This query falls back to a base table scan, because we have a
clause with two aggregate fragments over the same table, with each fragment
using a different materialized view.
The following query also falls back to a base table scan, because we have a
UNION ALL clause with two aggregate fragments over the same table, and only
one of them is using a materialized view:
SELECT sum_x as result, "sum" as what FROM dataset.mv1 UNION ALL SELECT AVG(x) as result, "avg" as what FROM dataset.table1
If we replace
mv3 in the
JOIN example above, both materialized views
are leveraged, because they have different base tables. Similarly, if the second
input of the
UNION ALL clause in the case above were
would also be used.
In practice, this means that a single query should reference only one materialized view for a given base table, and should not reference a materialized view and its base table at the same time.