Creating and using materialized views

This document describes how to create and use materialized views with BigQuery Materialized Views. This document assumes you have already read Introduction to materialized views.

Creating a materialized view

You can create a materialized view through the Google Cloud Console, the bq command-line tool, or the BigQuery API.

For purposes of these examples, assume your base table is named my_base_table and has this schema:

Column name Type
product_id integer
clicks integer

Also assume that you want a materialized view that summarizes the number of clicks per product_id. The following steps create a materialized view named my_mv_table that has the following schema.

Column name Type
product_id integer
sum_clicks integer

To create a materialized view:

Console

Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.

See more on Using Data Definition Language statements.

To create a materialized view in the Cloud Console by using a DDL statement:

  1. In the Cloud Console, go to the BigQuery page.

    Go to the BigQuery page

  2. Click Compose new query.

  3. Type your CREATE MATERIALIZED VIEW DDL statement into the Query editor text area.

    CREATE MATERIALIZED VIEW  project-id.my_dataset.my_mv_table
    AS SELECT product_id, SUM(clicks) AS sum_clicks
    FROM  project-id.my_dataset.my_base_table
    GROUP BY 1
    

    where:

    • project-id is your project ID.
    • my_dataset is the ID of a dataset in your project.
    • my_mv_table is the ID of the materialized view that you're creating.
    • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
    • product_id is a column from the base table.
    • clicks is a column from the base table.
    • sum_clicks is a column in the materialized view that you are creating.
  4. Click Run.

Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.

bq

Use the bq query command and supply the DDL statement as the query parameter.

bq query --use_legacy_sql=false '
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
AS SELECT product_id, SUM(clicks) AS sum_clicks
FROM  project-id.my_dataset.my_base_table
GROUP BY 1'

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're creating.
  • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  • product_id is a column from the base table.
  • clicks is a column from the base table.
  • sum_clicks is a column in the materialized view that you are creating.

Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.

Alternatively, you can use the bq mk command with the --materialized_view argument to create a materialized view. The following arguments work with the --materialized_view argument:

  • --enable_refresh: Specifies whether automatic refresh is enabled.
  • --refresh_interval_ms: Specifies the time, in milliseconds, for the refresh interval.

API

Call the tables.insert method with a defined materializedView resource as part of your API request. The materializedView resource contains a query field. For example:

{
  "kind": "bigquery#table",
  "tableReference": {
    "projectId": "project-id",
    "datasetId": "my_dataset",
    "tableId": "my_mv_table"
  },
  "materializedView": {
    "query": "select product_id,sum(clicks) as
                sum_clicks from project-id.my_dataset.my_base_table
                group by 1"
  }
}

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're creating.
  • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  • product_id is a column from the base table.
  • clicks is a column from the base table.
  • sum_clicks is a column in the materialized view that you are creating.

Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query might return success right away, but the initial refresh might still be running. When the materialized view is successfully created, it appears in the Datasets pane.

Querying a materialized view

Query your materialized views directly, the same way you query a regular table or standard view.

When you query a base table that has a materialized view, the query optimizer might automatically reroute to use the cached result that is stored in the materialized view. The query plan will show that the query was rewritten to use the materialized view.

If multiple materialized views can be used to reroute a query, the materialized view with the smallest estimated number of rows to be scanned is used.

If you query a materialized view that is not up-to-date, data from the materialized view is combined with delta changes from the base table to compute up-to-date results. Reasons that a materialized view might not be up-to- date include:

  • The materialized view is not set up for automatic refresh because the user decided to control refreshes manually.

  • The base table is updated too frequently, so the frequency refresh cap is being enforced.

  • The base table was changed only seconds ago, so the materialized view hasn't yet caught up with the base table.

For information about running queries, see Overview of querying BigQuery data.

Altering a materialized view

You can alter a materialized view through the Cloud Console or the bq command-line tool, by using DDL with ALTER MATERIALIZED VIEW and SET OPTIONS.

The following shows an example that sets enable_refresh to true. Adjust as needed for your use case.

Console

To alter a materialized view in the Cloud Console by using a DDL statement:

  1. Open the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Click Compose new query.

  3. Type your ALTER MATERIALIZED VIEW DDL statement into the Query editor text area.

    ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
    SET OPTIONS (enable_refresh=true)
    

    where:

    • project-id is your project ID.
    • my_dataset is the ID of a dataset in your project.
    • my_mv_table is the ID of the materialized view that you're altering.
  4. Click Run.

bq

Use the bq query command and supply the DDL statement as the query parameter.

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh=true)

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're altering.

Alternatively, you can run the bq update command.

bq update \
--enable_refresh=true \
--refresh_interval_ms= \
project-id.my_dataset.my_mv_table

Manipulating a materialized view

A materialized view can be manipulated only by the CREATE, DROP, or ALTER statements, along with their API equivalents, and a manual refresh.

The following operations are not allowed on a materialized view.

  • Running Copy, Import, or Export jobs where either the source or the destination is a materialized view.

  • Writing query results into a materialized view.

  • Calling tabledata.list.

  • Using the BigQuery Storage API.

Deleting a materialized view

You can delete a materialized view through the Cloud Console, the bq command-line tool, or the API.

Console

To delete a materialized view in the Cloud Console by using a DDL statement:

  1. Open the BigQuery page in the Cloud Console.
    Go to the BigQuery page

  2. Click Compose new query.

  3. Type your DELETE MATERIALIZED VIEW DDL statement into the Query editor text area.

    DROP MATERIALIZED VIEW project-id.my_dataset.my_mv_table
    

    where:

    • project-id is your project ID.
    • my_dataset is the ID of a dataset in your project.
    • my_mv_table is the ID of the materialized view that you're deleting.
  4. Click Run.

bq

Use the bq query command and supply the DDL statement as the query parameter.

bq query --use_legacy_sql=false '
DROP MATERIALIZED VIEW
project-id.my_dataset.my_mv_table'

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're deleting.

Alternatively, you can use the bq rm command to delete the materialized view.

API

Call the tables.delete method and specify values for the projectId, datasetId, and tableId parameters:

  • Assign the projectId parameter to your project ID.
  • Assign the datasetId parameter to your dataset ID.
  • Assign the tableId parameter to the table ID of the materialized view that you're deleting.

Supported materialized views

Materialized views must be an aggregation over a single table. The use of GROUP BY is optional.

Currently, materialized views support a restricted SQL syntax. Queries must use the following pattern:

SELECT
  expression [ [ AS ] alias ] ] [, ...]
FROM table_name [ [ AS ] alias ]
[ WHERE bool_expression ]
[ GROUP BY expression [, ...] ]

Only the following aggregation functions are currently supported:

  • APPROX_COUNT_DISTINCT
  • ARRAY_AGG
  • AVG
  • COUNT
  • HLL_COUNT.INIT
  • MAX
  • MIN
  • SUM

Certain subqueries are supported, as long as they don't add any computation or filtering on top of an aggregation. For example, the following SQL queries are supported:

WITH tmp AS (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, *
  FROM dataset.table
)
SELECT ts_hour, COUNT(*) as cnt
FROM tmp
GROUP BY ts_hour

WITH tmp AS (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
  FROM dataset.table
  GROUP BY 1
)
SELECT * FROM tmp

SELECT ts_hour, COUNT(*) as cnt
FROM (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, *
  FROM dataset.table
)
GROUP BY ts_hour

SELECT * FROM (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
  FROM dataset.table
  GROUP BY 1
)

The following SQL query is not supported for materialized views, because it applies computation on top of an aggregation:

-- Not supported for a materialized view
WITH tmp AS (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
  FROM dataset.table
  GROUP BY 1
)
SELECT TIMESTAMP_TRUNC(ts_hour, DAY) as ts_day, cnt+1 as cnt FROM tmp

The following SQL query is not supported for materialized views, because it applies filtering on top of an aggregation:

-- Not supported for a materialized view
SELECT ts_hour, cnt
FROM (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) as cnt
  FROM dataset.table
  GROUP BY 1
)
WHERE ts_hour != TIMESTAMP(DATE('2020-01-01'))

Supported query rewrite patterns

The following examples show base table queries and the materialized view definitions for which the rewrite is currently supported.

Pattern 1

This pattern shows that grouping keys and aggregators in the query are a subset of those in the materialized view.

For this base table query:

SELECT
  ss_sold_date_sk,
  SUM(ss_net_profit) AS sum_profit
FROM store_sales
GROUP BY 1

The following is a materialized view query that shows the base table query's grouping keys and aggregators as a subset of the materialized view's query.

SELECT
  ss_store_sk,
  ss_sold_date_sk,
  SUM(ss_net_paid) AS sum_paid,
  SUM(ss_net_profit) AS sum_profit,
  COUNT(*) AS cnt_sales
FROM store_sales
GROUP BY 1, 2

Pattern 2

This pattern shows grouping keys in the base table query can be computed from grouping keys in the materialized view.

For this base table query:

SELECT
  DATE_TRUNC(DATE(l_shipdate) MONTH) as shipmonth,
  SUM(l_extendedprice) as sum_price
FROM lineitem
GROUP BY 1

The following is a materialized view query that computes grouping keys in the base table:

SELECT
  DATE(l_shipdate) as shipdate,
  SUM(l_extendedprice) as sum_price
FROM lineitem
GROUP BY 1

Pattern 3

This patterns shows filter expressions in the query can be derived from grouping keys in the materialized view, or match the materialized view exactly.

For this base table query:

SELECT
  DATE_TRUNC(DATE(l_shipdate) MONTH) as shipmonth,
  SUM(l_extendedprice) as sum_price
FROM lineitem
WHERE DATE(l_shipdate) >= DATE(2016, 01, 01) AND l_shipmode = "AIR"
GROUP BY 1

The following is a materialized view query that derives a filter expression from the base table:

SELECT
  DATE(l_shipdate) as shipdate,
  SUM(l_extendedprice) as sum_price
FROM lineitem
WHERE l_shipmode = "AIR"
GROUP BY 1

Pattern 4

This pattern shows expressions in the base table query select a subset of the data in the materialized view.

For this base table query:

SELECT
  l_discount,
  COUNT(*) as cnt
FROM lineitem
WHERE l_discount > 20.0
GROUP BY 1

The following is a materialized view query whose results are a superset of the base table query:

SELECT
  l_discount,
  COUNT(*) as cnt
FROM lineitem
WHERE l_discount IS NOT NULL
GROUP BY 1

Incremental updates

In cases where the base table keeps changing with appends only, the query that uses the materialized view will scan all materialized view data plus a delta in the base table since the last view refresh. This applies whether the materialized view is explicitly referenced or selected by the query optimizer. The result is the queries are faster and cheaper.

If however there were any updates or deletions in the base table since the last materialized view refresh, then the materialized view might not be scanned.

The following are examples of actions that cause an update or deletion:

  • DML UPDATE
  • DML MERGE
  • DML DELETE
  • truncation
  • partition expiration
  • console, bq command-line, and API equivalents of the preceding items in this list

If these actions occur, the materialized view query might not achieve any savings, until the next view refresh occurs. In effect, any update or deletion in the base table invalidates some portion of the materialized view state. For unpartitioned views, the entire view is invalidated. For partitioned views, only the affected partitions are invalidated.

Note that data from the BigQuery streaming buffer of the base table is not saved into a materialized view. A streaming buffer is still being scanned fully, regardless of whether a materialized view is used.

Working with partitioned and clustered tables

Materialized views can be partitioned by the same partition column that is configured on the base table.

Materialized views can be clustered by arbitrary columns, subject to the BigQuery clustered table limitations.

The following shows how to create a base table with a DATE partition on the transaction_time column:

CREATE TABLE project-id.my_dataset.my_base_table(
  employee_id INT64,
  transaction_time TIMESTAMP)
PARTITION BY DATE(transaction_time)
OPTIONS ( partition_expiration_days = 2)

The following shows how to create a materialized view partitioned by the same transaction_time column:

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY DATE(transaction_time)
CLUSTER BY employee_id
AS SELECT
  employee_id,
  transaction_time,
  COUNT(1) AS cnt
FROM my_dataset.my_base_table
GROUP BY 1, 2;

When the base table is partitioned by ingestion time, a materialized view can group by the _PARTITIONDATE column of the base table, and also partition by it.

The following shows how to create a base table that is partitioned by ingestion time:

CREATE TABLE project-id.my_dataset.my_base_table(
  employee_id INT64)
PARTITION BY _PARTITIONDATE
OPTIONS ( partition_expiration_days = 2)

The following shows how to create a materialized view that is grouped by, and also partitioned by, the ingestion time from the base table:

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY date
CLUSTER BY employee_id
AS SELECT
  employee_id,
  _PARTITIONDATE as date,
  COUNT(1) AS count
FROM my_dataset.my_base_table
GROUP BY 1, 2;

When the base table is partitioned by TIMESTAMP, the materialized view can truncate that timestamp to up to the granularity of a DAY and then use the truncated timestamp as a partitioning column.

The following shows how to create a materialized view partitioned on the transaction_hour column, which is a truncation of the base table's transaction_time column. Use the TIMESTAMP_TRUNC function to truncate the timestamp.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY DATE(transaction_hour)
AS SELECT
  employee_id,
  TIMESTAMP_TRUNC(transaction_time, HOUR) as transaction_hour,
  COUNT(1) AS cnt
FROM my_dataset.my_base_table
GROUP BY 1, 2;

The following shows the granularity that you can use for the timestamp truncation:

  • MICROSECOND
  • MILLISECOND
  • SECOND
  • MINUTE
  • HOUR
  • DAY

For general information about timestamp truncation, seeTIMESTAMP_TRUNC.

Partition alignment

If a materialized view is partitioned, BigQuery ensures that its partitions are aligned with the partitions of the base table. For example, a row from partition 2020-01-01 of the base table is being aggregated to a row from partition 2020-01-01 of the materialized view.

Partition alignment helps guarantee that a materialized view is maintained efficiently. During refresh, the only partitions of the materialized view that will be updated are those that have appends, updates, or deletions in the corresponding partitions of the base table. At query time, partitioning alignment ensures that a partition filter is applied to both the materialized view scan and the base table scan.

Partition expiration

Partition expiration must not be set on materialized views. A materialized view implicitly inherits the partition expiration time from the base table. Materialized view partitions are aligned with the base table partitions, so they expire synchronously.

Maintaining materialized views

You manage how materialized views are being refreshed to update precomputed results from the base table via automatic and manual refreshes. You can configure the refresh settings when you create a materialized view if you do not want to use default values. You can alter the refresh settings after the materialized view is created.

You can manually refresh a materialized view at any time.

Automatic refresh

By default, materialized views are automatically refreshed within 5 minutes of a change to the base table. Examples of changes include row insertions or row deletions.

Automatic refresh can be enabled or disabled at any time.

To turn automatic refresh off when you create a table, set enable_refresh to false.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
PARTITION BY RANGE_BUCKET(column, buckets)
OPTIONS (enable_refresh = false)
AS SELECT ...

For an existing materialized view, you can modify the enable_refresh value using ALTER MATERIALIZED VIEW.

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh = true)

Note that turning on automatic refresh will trigger an automatic refresh on the materialized view.

Even if automatic refresh is disabled, the materialized view can be manually refreshed at any time.

Setting the frequency cap

You can configure a frequency cap on how often automatic refresh is run. By default, materialized views are refreshed no more often than every 30 minutes.

The refresh frequency cap can be changed at any time.

To set a refresh frequency cap when you create a materialized view, set refresh_interval_minutes in DDL (or refresh_interval_ms in the API and bq command-line tool), to the value you want.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
OPTIONS (enable_refresh = true, refresh_interval_minutes = 60)
AS SELECT ...

Similarly, you can set the frequency cap when you modify a table. This example assumes you have already enabled automatic refresh, and just want to change the frequency cap:

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (refresh_interval_minutes = 60)

The minimum refresh frequency cap is 1 minute. The maximum refresh frequency cap is 7 days.

You can perform a manual refresh of a materialized view at any time, and its timing is not subject to the frequency cap.

Manual refresh

To update the data in the materialized view, call the BQ.REFRESH_MATERIALIZED_VIEW system procedure at any time. When this procedure is called, BigQuery identifies changes that have taken place in the base table and applies those changes to the materialized view. The query to run BQ.REFRESH_MATERIALIZED_VIEW will finish when the refresh is complete.

CALL BQ.REFRESH_MATERIALIZED_VIEW('project-id.my_dataset.my_mv_table')

You should perform no more than one refresh at a time. Although multiple refreshes might be started to run concurrently for the same materialized view, only the first refresh to complete will succeed.

Required permissions

In order to manually refresh a materialized view, you must be granted the following permissions:

  • bigquery.tables.getData
  • bigquery.tables.update
  • bigquery.tables.updateData

Next steps