Create materialized views
This document describes how to create materialized views in BigQuery. Before you read this document, familiarize yourself with Introduction to materialized views.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To create materialized views, you need the bigquery.tables.create
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to create a materialized view:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
For more information about BigQuery Identity and Access Management (IAM), see Access control with IAM.
Create materialized views
You can create BigQuery materialized views
through the Google Cloud console, the
bq
command-line tool, or the BigQuery API.
For purposes of these examples, assume your base table is named
my_base_table
and has this schema:
Column name | Type |
---|---|
product_id | integer |
clicks | integer |
Also assume that you want a materialized view that summarizes the number
of clicks per product_id
. The following steps create
a materialized view named my_mv_table
that has the following schema.
Column name | Type |
---|---|
product_id | integer |
sum_clicks | integer |
To create a materialized view
Console
Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax. For more information about DDL statements, see Data definition language (DDL) statements in standard SQL.
To create a materialized view in the Cloud console by using a DDL statement:
In the Cloud console, go to the BigQuery page.
Click Compose new query.
Type your
CREATE MATERIALIZED VIEW
DDL statement into the Query editor text area.CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1
Click Run.
bq
Use the bq query
command
and supply the DDL statement as the query parameter.
bq query --use_legacy_sql=false ' CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table AS SELECT product_id, SUM(clicks) AS sum_clicks FROM project-id.my_dataset.my_base_table GROUP BY 1'
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Alternatively, you can use the bq mk
command
with the --materialized_view
argument to create a materialized view. The
following arguments work with the --materialized_view
argument:
--enable_refresh
: Specifies whether automatic refresh is enabled.--refresh_interval_ms
: Specifies the time, in milliseconds, for the maximum refresh frequency. For more information on the meaning of these parameters, see Refresh materialized views.
API
Call the tables.insert
method with a defined materializedView
resource as part of your API
request. The materializedView
resource contains a query
field. For
example:
{ "kind": "bigquery#table", "tableReference": { "projectId": "project-id", "datasetId": "my_dataset", "tableId": "my_mv_table" }, "materializedView": { "query": "select product_id,sum(clicks) as sum_clicks from project-id.my_dataset.my_base_table group by 1" } }
where:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
When the materialized view is successfully created, it appears in the Explorer panel of BigQuery in the Google Cloud console. Materialized views are listed on the Explorer panel within the Cloud console. As an example, the following shows how a materialized view's schema appears:
Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query will finish quickly, but the initial refresh might continue to run.
Access control
You can grant access to a materialized view at the dataset level, the view level, or the column level. You can also set access at a higher level in the IAM resource hierarchy.
Querying a materialized view requires access to the view as well as its base tables. To share a materialized view, you can grant permissions to the base tables or configure a materialized view as an authorized view. For more information, see Authorized views.
To control access to views in BigQuery, see Controlling access to views.
Materialized views query support
Materialized views use a restricted SQL syntax. Queries must use the following pattern:
[ WITH cte [, …]] SELECT [{ ALL | DISTINCT }] expression [ [ AS ] alias ] ] [, ...] FROM from_item [, ...] [ WHERE bool_expression ] [ GROUP BY expression [, ...] ] from_item: { table_name [ as_alias ] | { join_operation | ( join_operation ) } | field_path | unnest_operator | cte_name [ as_alias ] } as_alias: [ AS ] alias
Query limitations
Materialized views have the following limitations.
Aggregate requirements
Aggregates in the materialized view query must be outputs. Computing or
filtering based on an aggregated value is not supported. For example, the
following view is not supported because it produces a value computed from an
aggregate, COUNT(*) / 10 as cnt
.
SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, COUNT(*) / 10 as cnt FROM dataset.table GROUP BY 1
Only the following aggregation functions are currently supported:
ANY_VALUE
(but not overSTRUCT
)APPROX_COUNT_DISTINCT
ARRAY_AGG
(but not overARRAY
orSTRUCT
)AVG
BIT_AND
BIT_OR
BIT_XOR
COUNT
COUNTIF
HLL_COUNT.INIT
LOGICAL_AND
LOGICAL_OR
MAX
MIN
SUM
Unsupported SQL features
The following SQL features are not supported in materialized views:
- Left/right/full outer joins.
- Self-joins (joins using the same table more than once).
- Analytic functions
- Non-deterministic functions such as RAND(), CURRENT_DATE(), or CURRENT_TIME().
- User-defined functions (UDFs).
- TABLESAMPLE.
- FOR SYSTEM_TIME AS OF.
WITH
clause and common table expressions (CTEs)
Materialized views support WITH
clauses and common table expressions.
Materialized views with WITH
clauses must still follow the pattern and
limitations of materialized views without WITH
clauses.
Examples
The following example shows a materialized view using a WITH clause.
WITH tmp AS ( SELECT TIMESTAMP_TRUNC(ts, HOUR) as ts_hour, * FROM dataset.table ) SELECT ts_hour, COUNT(*) as cnt FROM tmp GROUP BY ts_hour
The following example shows a view using a WITH
clause that is not supported
because it does not fit the pattern above (there are two GROUP BY
clauses).
WITH tmp AS ( SELECT city, COUNT(*) as population FROM dataset.table GROUP BY 1 ) SELECT population, COUNT(*) as cnt GROUP BY 1
Partitioned materialized views
Materialized views on partitioned tables can be partitioned. Partitioning a materialized view is similar to partitioning a normal table, in that it provides benefit when queries often access a subset of the partitions. In addition, partitioning a materialized view can improve the view's behavior when data in the base table or tables is modified or deleted. For more information, see Partition alignment.
If the base table is partitioned, then you can partition a materialized view on the same partitioning column. For time-based partitions, the granularity must match (hourly, daily, monthly, or yearly). For integer-range partitions, the range specification must exactly match. You cannot partition a materialized view over a non-partitioned base table.
If the base table is partitioned by ingestion time, then a materialized view can
group by the _PARTITIONDATE
column of the base table, and also partition by it.
If you don't explicitly specify partitioning when you create the materialized
view, then the materialized view is unpartitioned.
If the base table is partitioned and its materialized view has significant size, then the materialized view should be partitioned as well. As a guideline, a materialized view has significant size if it is about the same size as, or larger than, one partition of the base table.
Partition expiration
Partition expiration can't be set on materialized views. A materialized view implicitly inherits the partition expiration time from the base table. Materialized view partitions are aligned with the base table partitions, so they expire synchronously.
Example 1
In this example, the base table is partitioned on the transaction_time
column
with daily partitions. The materialized view is partitioned on the same column
and clustered on the employee_id
column.
CREATE TABLE project-id.my_dataset.my_base_table( employee_id INT64, transaction_time TIMESTAMP) PARTITION BY DATE(transaction_time) OPTIONS (partition_expiration_days = 2); CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table PARTITION BY DATE(transaction_time) CLUSTER BY employee_id AS SELECT employee_id, transaction_time, COUNT(1) AS cnt FROM my_dataset.my_base_table GROUP BY 1, 2;
Example 2
In this example, the base table is partitioned by ingestion time with daily
partitions. The materialized view selects the ingestion time as a column named
date
. The materialized view is grouped by the date
column and partitioned by
the same column.
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table PARTITION BY date CLUSTER BY employee_id AS SELECT employee_id, _PARTITIONDATE as date, COUNT(1) AS count FROM my_dataset.my_base_table GROUP BY 1, 2;
Example 3
In this example, the base table is partitioned on a TIMESTAMP
column named
transaction_time
, with daily partitions. The materialized view defines a
column named transaction_hour
, using the TIMESTAMP_TRUNC
function to truncate the value to the nearest hour. The materialized view is
grouped by transaction_hour
and also partitioned by it.
Note the following:
The truncation function that is applied to the partitioning column must be at least as granular as the partitioning of the base table. For example, if the base table uses daily partitions, the truncation function cannot use
MONTH
orYEAR
granularity.In the materialized view's partition specification, the granularity has to match the base table.
CREATE TABLE project-id.my_dataset.my_base_table( employee_id INT64, transaction_time TIMESTAMP) PARTITION BY DATE(transaction_time); CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table PARTITION BY DATE(transaction_hour) AS SELECT employee_id, TIMESTAMP_TRUNC(transaction_time, HOUR) as transaction_hour, COUNT(1) AS cnt FROM my_dataset.my_base_table GROUP BY 1, 2;
Cluster materialized views
You can cluster materialized views by their output columns, subject to the BigQuery clustered table limitations. Aggregate output columns cannot be used as clustering columns. Adding clustering columns to materialized views can improve the performance of queries that include filters on those columns.
Considerations when creating materialized views
Which materialized views to create
When creating a materialized view, ensure your materialized view definition reflects query patterns against the base tables. Because there is a maximum of 20 materialized views per table, you should not create a materialized view for every permutation of a query. Instead, create materialized views to serve a broader set of queries.
For example, consider a query on a table where users often filter by columns user_id or department. You can group by these columns and optionally cluster by them, instead of adding filters like user_id = 123 into the materialized view.
As another example, users often use date filters, either by specific date (where order_date = current_date()) or date range (where order_date between '2019-10-01' and '2019-10-31'). As a good practice, add a date range filter in the materialized view that covers expected date ranges in the query:
CREATE MATERIALIZED VIEW ... WHERE date > '2019-01-01' GROUP BY date
Joins
The following recommendations apply to materialized views with JOINs.
Put the most frequently changing table first
Ensure that the largest or most frequently changing table is the first/leftmost table referenced in the view query. Materialized views with joins support incremental queries and refresh when the first or left-most table in the query is appended, but changes to other tables fully invalidate the view cache. In star or snowflake schemas the first or leftmost table should generally be the fact table.
Avoid joining on clustering keys
Materialized views with joins work best in cases where the data is heavily aggregated or the original join query is expensive. For selective queries, BigQuery is often already able to perform the join efficiently and no materialized view is needed. For example consider the following materialized view definitions.
CREATE MATERIALIZED VIEW dataset.mv CLUSTER BY s_marked_id AS SELECT s_market_id, s_country, SUM(ss_net_paid) AS sum_sales, COUNT(*) AS cnt_sales FROM dataset.store_sales INNER JOIN dataset.store ON ss_store_sk = s_store_sk GROUP BY 1, 2;
If store_sales is clustered on ss_store_sk and you often run queries like:
SELECT SUM(ss_net_paid) FROM dataset.store_sales INNER JOIN dataset.store ON ss_store_sk = s_store_sk WHERE s_country = 'Germany'
then the materialized view might not be as efficient as the original query. For best results, experiment with a representative set of queries, with and without the materialized view.