This document describes how to create and use materialized views with BigQuery Materialized Views. This document assumes you have already read Introduction to materialized views.
Creating a materialized view
You can create a materialized view through the Google Cloud Console, the bq
command-line tool,
or the BigQuery API.
For purposes of these examples, assume your base table is named
my_base_table
and has this schema:
Column name | Type |
---|---|
product_id | integer |
clicks | integer |
Also assume that you want a materialized view that summarizes the number
of clicks per product_id
. The following steps create
a materialized view named my_mv_table
that has the following schema.
Column name | Type |
---|---|
product_id | integer |
sum_clicks | integer |
To create a materialized view:
Console
Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.
See more on Using Data Definition Language statements.
To create a materialized view in the Cloud Console by using a DDL statement:
In the Cloud Console, go to the BigQuery page.
Click Compose new query.
Type your
CREATE MATERIALIZED VIEW
DDL statement into the Query editor text area.CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Click Run.
Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.
bq
Use the bq query
command and supply the DDL statement as the
query parameter.
bq query --use_legacy_sql=false ' CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1'
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.
Alternatively, you can use the bq mk
command with the
--materialized_view
argument to create a materialized view. The following
arguments work with the --materialized_view
argument:
--enable_refresh
: Specifies whether automatic refresh is enabled.--refresh_interval_ms
: Specifies the time, in milliseconds, for the refresh interval.
API
Call the tables.insert
method with a defined materializedView
resource as part of your API request. The materializedView
resource
contains a query
field. For example:
{ "kind": "bigquery#table", "tableReference": { "projectId": "project-id", "datasetId": "my_dataset", "tableId": "my_mv_table" }, "materializedView": { "query": "select product_id,sum(clicks) as sum_clicks from project-id.my_dataset.my_base_table group by 1" } }
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Querying a materialized view
Query your materialized views directly, the same way you query a regular table or standard view.
When you query a base table that has a materialized view, the query optimizer might automatically reroute to use the cached result that is stored in the materialized view. The query plan will show that the query was rewritten to use the materialized view.
If multiple materialized views can be used to reroute a query, the materialized view with the smallest estimated number of rows to be scanned is used.
If you query a materialized view that is not up-to-date, data from the materialized view is combined with delta changes from the base table to compute up-to-date results. Reasons that a materialized view might not be up-to- date include:
The materialized view is not set up for automatic refresh because the user decided to control refreshes manually.
The base table is updated too frequently, so the frequency refresh cap is being enforced.
The base table was changed only seconds ago, so the materialized view hasn't yet caught up with the base table.
For information about running queries, see Overview of querying BigQuery data.
Altering a materialized view
You can alter a materialized view through the Cloud Console or the
bq
command-line tool, by using DDL with ALTER MATERIALIZED VIEW
and SET OPTIONS
.
The following shows an example that sets enable_refresh
to true
. Adjust as
needed for your use case.
Console
To alter a materialized view in the Cloud Console by using a DDL statement:
Open the BigQuery page in the Cloud Console.
Go to the BigQuery pageClick Compose new query.
Type your
ALTER MATERIALIZED VIEW
DDL statement into the Query editor text area.ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table SET OPTIONS (enable_refresh=true)
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're altering.
Click Run.
bq
Use the bq query
command and supply the DDL statement as the query
parameter.
ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh=true)
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're altering.
Alternatively, you can run the bq update
command.
bq update \ --enable_refresh=true \ --refresh_interval_ms= \ project-id.my_dataset.my_mv_table
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Manipulating a materialized view
A materialized view can be manipulated only by the CREATE
, DROP
, or ALTER
statements, along with their API equivalents, and a
manual refresh.
The following operations are not allowed on a materialized view.
Running Copy, Import, or Export jobs where either the source or the destination is a materialized view.
Writing query results into a materialized view.
Calling tabledata.list.
Using the BigQuery Storage API.
The following example alters a material view:
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Deleting a materialized view
You can delete a materialized view through the Cloud Console, the
bq
command-line tool, or the API.
Console
To delete a materialized view in the Cloud Console by using a DDL statement:
Open the BigQuery page in the Cloud Console.
Go to the BigQuery pageClick Compose new query.
Type your
DELETE MATERIALIZED VIEW
DDL statement into the Query editor text area.DROP MATERIALIZED VIEW project-id.my_dataset.my_mv_table
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're deleting.
Click Run.
bq
Use the bq query
command and supply the DDL statement as the query
parameter.
bq query --use_legacy_sql=false '
DROP MATERIALIZED VIEW
project-id.my_dataset.my_mv_table'
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're deleting.
Alternatively, you can use the bq rm
command to delete the
materialized view.
API
Call the tables.delete
method and specify values for the projectId
,
datasetId
, and tableId
parameters:
- Assign the
projectId
parameter to your project ID. - Assign the
datasetId
parameter to your dataset ID. - Assign the
tableId
parameter to the table ID of the materialized view that you're deleting.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Supported materialized views
Materialized views must be an aggregation over a single table. The use of
GROUP BY
is optional.
Currently, materialized views support a restricted SQL syntax. Queries must use the following pattern:
SELECT expression [ [ AS ] alias ] ] [, ...] FROM from_item [, ...] [ WHERE bool_expression ] [ GROUP BY expression [, ...] ]
Only the following aggregation functions are currently supported:
APPROX_COUNT_DISTINCT
ARRAY_AGG
AVG
COUNT
HLL_COUNT.INIT
MAX
MIN
SUM
Certain subqueries are supported, as long as they don't add any computation or filtering on top of an aggregation. For example, the following SQL queries are supported:
WITH tmp AS (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, *
FROM dataset.table
)
SELECT ts_hour, COUNT(*) as cnt
FROM tmp
GROUP BY ts_hour
WITH tmp AS (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
FROM dataset.table
GROUP BY 1
)
SELECT * FROM tmp
SELECT ts_hour, COUNT(*) as cnt
FROM (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, *
FROM dataset.table
)
GROUP BY ts_hour
SELECT * FROM (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
FROM dataset.table
GROUP BY 1
)
The following SQL query is not supported for materialized views, because it applies computation on top of an aggregation:
-- Not supported for a materialized view
WITH tmp AS (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
FROM dataset.table
GROUP BY 1
)
SELECT TIMESTAMP_TRUNC(ts_hour, DAY) as ts_day, cnt+1 as cnt FROM tmp
The following SQL query is not supported for materialized views, because it applies filtering on top of an aggregation:
-- Not supported for a materialized view
SELECT ts_hour, cnt
FROM (
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
FROM dataset.table
GROUP BY 1
)
WHERE ts_hour != TIMESTAMP(DATE('2020-01-01'))
FROM
clause
The FROM
clause must include a single table and can unnest one or more array
expressions.
Examples
FROM mytable
FROM mytable AS t, t.struct_column.array_field AS x
FROM mytable AS t LEFT JOIN UNNEST(t.array_column) AS x
FROM mytable AS t, t.array_column AS x, x.array_field AS y
FROM mytable AS t, UNNEST(SPLIT(t.string_column)) AS x
WITH OFFSET
is not supported.
Supported query rewrite patterns
The following examples show base table queries and the materialized view definitions for which the rewrite is currently supported.
Pattern 1
This pattern shows that grouping keys and aggregators in the query are a subset of those in the materialized view.
For this base table query:
SELECT ss_sold_date_sk, SUM(ss_net_profit) AS sum_profit FROM store_sales GROUP BY 1
The following is a materialized view query that shows the base table query's grouping keys and aggregators as a subset of the materialized view's query.
SELECT ss_store_sk, ss_sold_date_sk, SUM(ss_net_paid) AS sum_paid, SUM(ss_net_profit) AS sum_profit, COUNT(*) AS cnt_sales FROM store_sales GROUP BY 1, 2
Pattern 2
This pattern shows grouping keys in the base table query can be computed from grouping keys in the materialized view.
For this base table query:
SELECT DATE_TRUNC(DATE(l_shipdate) MONTH) as shipmonth, SUM(l_extendedprice) as sum_price FROM lineitem GROUP BY 1
The following is a materialized view query that computes grouping keys in the base table:
SELECT DATE(l_shipdate) as shipdate, SUM(l_extendedprice) as sum_price FROM lineitem GROUP BY 1
Pattern 3
This patterns shows filter expressions in the query can be derived from grouping keys in the materialized view, or match the materialized view exactly.
For this base table query:
SELECT DATE_TRUNC(DATE(l_shipdate) MONTH) as shipmonth, SUM(l_extendedprice) as sum_price FROM lineitem WHERE DATE(l_shipdate) >= DATE(2016, 01, 01) AND l_shipmode = "AIR" GROUP BY 1
The following is a materialized view query that derives a filter expression from the base table:
SELECT DATE(l_shipdate) as shipdate, SUM(l_extendedprice) as sum_price FROM lineitem WHERE l_shipmode = "AIR" GROUP BY 1
Pattern 4
This pattern shows expressions in the base table query select a subset of the data in the materialized view.
For this base table query:
SELECT l_discount, COUNT(*) as cnt FROM lineitem WHERE l_discount > 20.0 GROUP BY 1
The following is a materialized view query whose results are a superset of the base table query:
SELECT l_discount, COUNT(*) as cnt FROM lineitem WHERE l_discount IS NOT NULL GROUP BY 1
Incremental updates
In cases where the base table keeps changing with appends only, the query that uses the materialized view will scan all materialized view data plus a delta in the base table since the last view refresh. This applies whether the materialized view is explicitly referenced or selected by the query optimizer. The result is the queries are faster and cheaper.
If however there were any updates or deletions in the base table since the last materialized view refresh, then the materialized view might not be scanned.
The following are examples of actions that cause an update or deletion:
DML UPDATE
DML MERGE
DML DELETE
- truncation
- partition expiration
- console,
bq
command-line, and API equivalents of the preceding items in this list
If these actions occur, the materialized view query might not achieve any savings, until the next view refresh occurs. In effect, any update or deletion in the base table invalidates some portion of the materialized view state. For unpartitioned views, the entire view is invalidated. For partitioned views, only the affected partitions are invalidated.
Note that data from the BigQuery streaming buffer of the base table is not saved into a materialized view. A streaming buffer is still being scanned fully, regardless of whether a materialized view is used.
Working with partitioned and clustered tables
Materialized views can be partitioned by the same partition column that is configured on the base table.
Materialized views can be clustered by arbitrary columns, subject to the BigQuery clustered table limitations.
The following shows how to create a base table with a DATE
partition on the
transaction_time
column:
CREATE TABLE project-id.my_dataset.my_base_table( employee_id INT64, transaction_time TIMESTAMP) PARTITION BY DATE(transaction_time) OPTIONS ( partition_expiration_days = 2)
The following shows how to create a materialized view partitioned by the same
transaction_time
column:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY DATE(transaction_time)
CLUSTER BY employee_id
AS SELECT
employee_id,
transaction_time,
COUNT(1) AS cnt
FROM my_dataset.my_base_table
GROUP BY 1, 2;
When the base table is partitioned by ingestion time, a materialized view can
group by the _PARTITIONDATE
column of the base table, and also partition by
it.
The following shows how to create a base table that is partitioned by ingestion time:
CREATE TABLE project-id.my_dataset.my_base_table( employee_id INT64) PARTITION BY _PARTITIONDATE OPTIONS ( partition_expiration_days = 2)
The following shows how to create a materialized view that is grouped by, and also partitioned by, the ingestion time from the base table:
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY date
CLUSTER BY employee_id
AS SELECT
employee_id,
_PARTITIONDATE as date,
COUNT(1) AS count
FROM my_dataset.my_base_table
GROUP BY 1, 2;
When the base table is partitioned by TIMESTAMP
, the materialized view can
truncate that timestamp to up to the granularity of a DAY
and then use the
truncated timestamp as a partitioning column.
The following shows how to create a materialized view partitioned on the
transaction_hour
column, which is a truncation of the base table's
transaction_time
column. Use the TIMESTAMP_TRUNC
function to truncate
the timestamp.
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY DATE(transaction_hour)
AS SELECT
employee_id,
TIMESTAMP_TRUNC(transaction_time, HOUR) as transaction_hour,
COUNT(1) AS cnt
FROM my_dataset.my_base_table
GROUP BY 1, 2;
The following shows the granularity that you can use for the timestamp truncation:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
For general information about timestamp truncation, seeTIMESTAMP_TRUNC
.
Partition alignment
If a materialized view is partitioned, BigQuery ensures that its
partitions are aligned with the partitions of the base table. For example, a row
from partition 2020-01-01
of the base table is being aggregated to a row from
partition 2020-01-01
of the materialized view.
Partition alignment helps guarantee that a materialized view is maintained efficiently. During refresh, the only partitions of the materialized view that will be updated are those that have appends, updates, or deletions in the corresponding partitions of the base table. At query time, partitioning alignment ensures that a partition filter is applied to both the materialized view scan and the base table scan.
Partition expiration
Partition expiration must not be set on materialized views. A materialized view implicitly inherits the partition expiration time from the base table. Materialized view partitions are aligned with the base table partitions, so they expire synchronously.
Maintaining materialized views
You manage how materialized views are being refreshed to update precomputed results from the base table via automatic and manual refreshes. You can configure the refresh settings when you create a materialized view if you do not want to use default values. You can alter the refresh settings after the materialized view is created.
You can manually refresh a materialized view at any time.
Automatic refresh
By default, materialized views are automatically refreshed within 5 minutes of a change to the base table. Examples of changes include row insertions or row deletions.
Automatic refresh can be enabled or disabled at any time.
To turn automatic refresh off when you create a table, set enable_refresh
to
false
.
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY RANGE_BUCKET(column, buckets)
OPTIONS (enable_refresh = false)
AS SELECT ...
For an existing materialized view, you can modify the enable_refresh
value
using ALTER MATERIALIZED VIEW
.
ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh = true)
Note that turning on automatic refresh will trigger an automatic refresh on the materialized view.
Even if automatic refresh is disabled, the materialized view can be manually refreshed at any time.
Setting the frequency cap
You can configure a frequency cap on how often automatic refresh is run. By default, materialized views are refreshed no more often than every 30 minutes.
The refresh frequency cap can be changed at any time.
To set a refresh frequency cap when you create a materialized view, set
refresh_interval_minutes
in DDL (or refresh_interval_ms
in the API and
bq
command-line tool), to the value you want.
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
OPTIONS (enable_refresh = true, refresh_interval_minutes = 60)
AS SELECT ...
Similarly, you can set the frequency cap when you modify a table. This example assumes you have already enabled automatic refresh, and just want to change the frequency cap:
ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (refresh_interval_minutes = 60)
The minimum refresh frequency cap is 1 minute. The maximum refresh frequency cap is 7 days.
You can perform a manual refresh of a materialized view at any time, and its timing is not subject to the frequency cap.
Manual refresh
To update the data in the materialized view, call the
BQ.REFRESH_MATERIALIZED_VIEW
system procedure at any time. When this procedure
is called, BigQuery identifies changes that have taken place in
the base table and applies those changes to the materialized view. The query to
run BQ.REFRESH_MATERIALIZED_VIEW
will finish when the refresh is complete.
CALL BQ
.REFRESH_MATERIALIZED_VIEW('project-id.my_dataset.my_mv_table')
You should perform no more than one refresh at a time. Although multiple refreshes might be started to run concurrently for the same materialized view, only the first refresh to complete will succeed.
Required permissions
In order to manually refresh a materialized view, you must be granted the following permissions:
bigquery.tables.getData
bigquery.tables.update
bigquery.tables.updateData
Next steps
- Learn about BigQuery Materialized Views best practices.
- Read the materialized views FAQ.