Create materialized views
This document describes how to create materialized views in BigQuery. Before you read this document, familiarize yourself with Introduction to materialized views.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To create materialized views, you need the bigquery.tables.create
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to create a materialized view:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
For more information about BigQuery Identity and Access Management (IAM), see Access control with IAM.
Create materialized views
To create a materialized view, select one of the following options:
SQL
Use the
CREATE MATERIALIZED VIEW
statement.
The following example creates a materialized view for the number of clicks
for each product ID:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE MATERIALIZED VIEW PROJECT_ID.DATASET.MATERIALIZED_VIEW_NAME AS ( QUERY_EXPRESSION );
Replace the following:
PROJECT_ID
: the name of your project in which you want to create the materialized view—for example,myproject
.DATASET
: the name of the BigQuery dataset that you want to create the materialized view in—for example,mydataset
. If you are creating a materialized view over an Amazon Simple Storage Service (Amazon S3) BigLake table (preview), make sure the dataset is in a supported region.MATERIALIZED_VIEW_NAME
: the name of the materialized view that you want to create—for example,my_mv
.QUERY_EXPRESSION
: the GoogleSQL query expression that defines the materialized view—for example,SELECT product_id, SUM(clicks) AS sum_clicks FROM mydataset.my_source_table
.
Click
Run.
For more information about how to run queries, see Run an interactive query.
Example
The following example creates a materialized view for the number of clicks for each product ID:
CREATE MATERIALIZED VIEW myproject.mydataset.my_mv_table AS ( SELECT product_id, SUM(clicks) AS sum_clicks FROM myproject.mydataset.my_base_table GROUP BY product_id );
Terraform
Use the
google_bigquery_table
resource.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
The following example creates a view named my_materialized_view
:
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud Shell
- Launch Cloud Shell.
-
Set the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Prepare the directory
Each Terraform configuration file must have its own directory (also called a root module).
-
In Cloud Shell, create a directory and a new
file within that directory. The filename must have the
.tf
extension—for examplemain.tf
. In this tutorial, the file is referred to asmain.tf
.mkdir DIRECTORY && cd DIRECTORY && touch main.tf
-
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created
main.tf
.Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
- Review and modify the sample parameters to apply to your environment.
- Save your changes.
-
Initialize Terraform. You only need to do this once per directory.
terraform init
Optionally, to use the latest Google provider version, include the
-upgrade
option:terraform init -upgrade
Apply the changes
-
Review the configuration and verify that the resources that Terraform is going to create or
update match your expectations:
terraform plan
Make corrections to the configuration as necessary.
-
Apply the Terraform configuration by running the following command and entering
yes
at the prompt:terraform apply
Wait until Terraform displays the "Apply complete!" message.
- Open your Google Cloud project to view the results. In the Google Cloud console, navigate to your resources in the UI to make sure that Terraform has created or updated them.
API
Call the tables.insert
method
and pass in a
Table
resource
with a defined materializedView
field:
{ "kind": "bigquery#table", "tableReference": { "projectId": "PROJECT_ID", "datasetId": "DATASET", "tableId": "MATERIALIZED_VIEW_NAME" }, "materializedView": { "query": "QUERY_EXPRESSION" } }
Replace the following:
PROJECT_ID
: the name of your project in which you want to create the materialized view—for example,myproject
.DATASET
: the name of the BigQuery dataset that you want to create the materialized view in—for example,mydataset
. If you are creating a materialized view over an Amazon Simple Storage Service (Amazon S3) BigLake table (preview), make sure the dataset is in a supported region.MATERIALIZED_VIEW_NAME
: the name of the materialized view that you want to create—for example,my_mv
.QUERY_EXPRESSION
: the GoogleSQL query expression that defines the materialized view—for example,SELECT product_id, SUM(clicks) AS sum_clicks FROM mydataset.my_source_table
.
Example
The following example creates a materialized view for the number of clicks for each product ID:
{ "kind": "bigquery#table", "tableReference": { "projectId": "myproject", "datasetId": "mydataset", "tableId": "my_mv" }, "materializedView": { "query": "select product_id,sum(clicks) as sum_clicks from myproject.mydataset.my_source_table group by 1" } }
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
After the materialized view is successfully created, it appears in the Explorer panel of BigQuery in the Google Cloud console. The following example shows a materialized view schema:
Unless you disable automatic refresh, BigQuery starts an asynchronous full refresh for the materialized view. The query finishes quickly, but the initial refresh might continue to run.
Access control
You can grant access to a materialized view at the dataset level, the view level, or the column level. You can also set access at a higher level in the IAM resource hierarchy.
Querying a materialized view requires access to the view as well as its base tables. To share a materialized view, you can grant permissions to the base tables or configure a materialized view as an authorized view. For more information, see Authorized views.
To control access to views in BigQuery, see Authorized views.
Materialized views query support
Materialized views use a restricted SQL syntax. Queries must use the following pattern:
[ WITH cte [, …]] SELECT [{ ALL | DISTINCT }] expression [ [ AS ] alias ] [, ...] FROM from_item [, ...] [ WHERE bool_expression ] [ GROUP BY expression [, ...] ] from_item: { table_name [ as_alias ] | { join_operation | ( join_operation ) } | field_path | unnest_operator | cte_name [ as_alias ] } as_alias: [ AS ] alias
Query limitations
Materialized views have the following limitations.
Aggregate requirements
Aggregates in the materialized view query must be outputs. Computing, filtering,
or joining based on an aggregated value is not supported. For example, creating
a view from the following query is not supported because it produces a value
computed from an aggregate, COUNT(*) / 10 as cnt
.
SELECT TIMESTAMP_TRUNC(ts, HOUR) AS ts_hour, COUNT(*) / 10 AS cnt FROM mydataset.mytable GROUP BY ts_hour;
Only the following aggregation functions are currently supported:
ANY_VALUE
(but not overSTRUCT
)APPROX_COUNT_DISTINCT
ARRAY_AGG
(but not overARRAY
orSTRUCT
)AVG
BIT_AND
BIT_OR
BIT_XOR
COUNT
COUNTIF
HLL_COUNT.INIT
LOGICAL_AND
LOGICAL_OR
MAX
MIN
MAX_BY
(but not overSTRUCT
)MIN_BY
(but not overSTRUCT
)SUM
Unsupported SQL features
The following SQL features are not supported in materialized views:
UNION ALL
. (Support in ) PreviewLEFT OUTER JOIN
(Support in ) PreviewRIGHT/FULL OUTER JOIN
.- Self-joins, also known as using a
JOIN
on the same table more than once. - Window functions.
ARRAY
subqueries.- Non-deterministic functions such as
RAND()
,CURRENT_DATE()
,SESSION_USER()
, orCURRENT_TIME()
. - User-defined functions (UDFs).
TABLESAMPLE
.FOR SYSTEM_TIME AS OF
.
LEFT OUTER JOIN
and UNION ALL
support
To request feedback or support for this feature, send an email to bq-mv-help @google.com.
Incremental materialized views support LEFT OUTER JOIN
and UNION ALL
.
Materialized views with LEFT OUTER JOIN
and UNION ALL
statements share the
limitations of other incremental materialized views. In addition, smart
tuning is not supported for
materialized views with union all or left outer join.
Examples
The following example creates an aggregate incremental materialized view with
a LEFT JOIN
. This view is incrementally updated when data appends to the left
table.
CREATE MATERIALIZED VIEW dataset.mv AS ( SELECT s_store_sk, s_country, s_zip, SUM(ss_net_paid) AS sum_sales, FROM dataset.store_sales LEFT JOIN dataset.store ON ss_store_sk = s_store_sk GROUP BY 1, 2, 3 );
The following example creates an aggregate incremental materialized view with
a UNION ALL
. This view is incrementally updated when data appends to either or
both tables. For more information about incremental updates, see
Incremental Updates.
CREATE MATERIALIZED VIEW dataset.mv PARTITION BY DATE(ts_hour) AS ( SELECT SELECT TIMESTAMP_TRUNC(ts, HOUR) AS ts_hour, SUM(sales) sum_sales FROM (SELECT ts, sales from dataset.table1 UNION ALL SELECT ts, sales from dataset.table2) GROUP BY 1 );
Access control restrictions
- If a user's query of a materialized view includes base table columns
that they cannot access due to column-level security, then the query fails
with the message
Access Denied
. - If a user queries a materialized view but doesn't have full access to all rows in the materialized views' base tables, then BigQuery runs the query against the base tables instead of reading materialized view data. This ensures the query respects all access control constraints. This limitation also applies when querying tables with data-masked columns.
WITH
clause and common table expressions (CTEs)
Materialized views support WITH
clauses and common table expressions.
Materialized views with WITH
clauses must still follow the pattern and
limitations of materialized views without WITH
clauses.
Examples
The following example shows a materialized view using a WITH
clause:
WITH tmp AS ( SELECT TIMESTAMP_TRUNC(ts, HOUR) AS ts_hour, * FROM mydataset.mytable ) SELECT ts_hour, COUNT(*) AS cnt FROM tmp GROUP BY ts_hour;
The following example shows a materialized view using a WITH
clause that is
not supported because it contains two GROUP BY
clauses:
WITH tmp AS ( SELECT city, COUNT(*) AS population FROM mydataset.mytable GROUP BY city ) SELECT population, COUNT(*) AS cnt GROUP BY population;
Materialized views over BigLake tables
To create materialized views over BigLake
tables, the
BigLake table must have metadata caching
enabled over
Cloud Storage data and the materialized view must have a
max_staleness
option value greater than the base table.
Materialized views over BigLake tables support the same set of
queries as other
materialized views.
Example
Creation of a simple aggregate view using a BigLake base table:
CREATE MATERIALIZED VIEW sample_dataset.sample_mv OPTIONS (max_staleness=INTERVAL "0:30:0" HOUR TO SECOND) AS SELECT COUNT(*) cnt FROM dataset.biglake_base_table;
For details about the limitations of materialized views over BigLake tables, see materialized views over BigLake tables.
Materialized views over Apache Iceberg tables
To request feedback or support for this feature, send an email to bq-mv-help@google.com.
You can reference large Iceberg tables in materialized views instead of migrating that data to BigQuery-managed storage.
Create a materialized view over an Iceberg table
To create a materialized view over an Iceberg, follow these steps:
Obtain an Iceberg table using one of the following methods:
- Create an Iceberg table with the JSON metadata file.
- Create an Iceberg table using the BigLake Metastore.
- Discover in AWS Glue federated datasets.
Example
CREATE EXTERNAL TABLE mydataset.myicebergtable WITH CONNECTION `myproject.us.myconnection` OPTIONS ( format = 'ICEBERG', uris = ["gs://mybucket/mydata/mytable/metadata/iceberg.metadata.json"] )
Reference your Iceberg table with the following partition-specifications:
"partition-specs" : [ { "spec-id" : 0, "fields" : [ { "name" : "birth_month", "transform" : "month", "source-id" : 3, "field-id" : 1000 } ]
Create a partition-aligned materialized view:
CREATE MATERIALIZED VIEW mydataset.myicebergmv PARTITION BY DATE_TRUNC(birth_month, MONTH) AS SELECT * FROM mydataset.myicebergtable;
Limitations
In addition to the limitations of standard Iceberg tables, materialized views over Iceberg tables have the following limitations:
- You can create a materialized view that is partition aligned with the base
table. However, the materialized view only supports time-based partition transformation,
for example,
YEAR
,MONTH
,DAY
, andHOUR
. - The granularity of the materialized view's partition cannot be finer than the
granularity of the base table's partition. For example, if you partition the
base table yearly using the
birth_date
column, creating a materialized view withPARTITION BY DATE_TRUNC(birth_date, MONTH)
doesn't work. - Any schema change invalidates the materialized view.
- Partition evolutions is supported. However, changing the partitioning columns of a base table without recreating the materialized view might result in full invalidation that cannot be fixed by refresh.
- There must be at least one snapshot in the base table.
- The Iceberg table must be a BigLake table, for example, an authorized external table.
- If VPC Service Controls is enabled, service accounts of the authorized external table must be added to your ingress rules, otherwise, VPC Service Controls blocks automatic background refresh for the materialized view.
The metadata.json
file of your Iceberg table must have
the following specifications. Without these specifications, your queries scan
the base table, failing to use the materialized result.
In table metadata:
current-snapshot-id
current-schema-id
snapshots
snapshot-log
In snapshots:
parent-snapshot-id
(if available)schema-id
operation
(in thesummary
field)
Partitioning (for the partitioned materialized view)
Partitioned materialized views
Materialized views on partitioned tables can be partitioned. Partitioning a materialized view is similar to partitioning a normal table, in that it provides benefit when queries often access a subset of the partitions. In addition, partitioning a materialized view can improve the view's behavior when data in the base table or tables is modified or deleted. For more information, see Partition alignment.
If the base table is partitioned, then you can partition a materialized view on the same partitioning column. For time-based partitions, the granularity must match (hourly, daily, monthly, or yearly). For integer-range partitions, the range specification must exactly match. You cannot partition a materialized view over a non-partitioned base table.
If the base table is partitioned by ingestion time, then a materialized view can
group by the _PARTITIONDATE
column of the base table, and also partition by it.
If you don't explicitly specify partitioning when you create the materialized
view, then the materialized view is unpartitioned.
If the base table is partitioned, consider partitioning your materialized view as well to reduce refresh job maintenance cost and query cost.
Partition expiration
Partition expiration can't be set on materialized views. A materialized view implicitly inherits the partition expiration time from the base table. Materialized view partitions are aligned with the base table partitions, so they expire synchronously.
Example 1
In this example, the base table is partitioned on the transaction_time
column
with daily partitions. The materialized view is partitioned on the same column
and clustered on the employee_id
column.
CREATE TABLE my_project.my_dataset.my_base_table( employee_id INT64, transaction_time TIMESTAMP) PARTITION BY DATE(transaction_time) OPTIONS (partition_expiration_days = 2); CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table PARTITION BY DATE(transaction_time) CLUSTER BY employee_id AS ( SELECT employee_id, transaction_time, COUNT(employee_id) AS cnt FROM my_dataset.my_base_table GROUP BY employee_id, transaction_time );
Example 2
In this example, the base table is partitioned by ingestion time with daily
partitions. The materialized view selects the ingestion time as a column named
date
. The materialized view is grouped by the date
column and partitioned by
the same column.
CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table PARTITION BY date CLUSTER BY employee_id AS ( SELECT employee_id, _PARTITIONDATE AS date, COUNT(1) AS count FROM my_dataset.my_base_table GROUP BY employee_id, date );
Example 3
In this example, the base table is partitioned on a TIMESTAMP
column named
transaction_time
, with daily partitions. The materialized view defines a
column named transaction_hour
, using the TIMESTAMP_TRUNC
function to truncate the value to the nearest hour. The materialized view is
grouped by transaction_hour
and also partitioned by it.
Note the following:
The truncation function that is applied to the partitioning column must be at least as granular as the partitioning of the base table. For example, if the base table uses daily partitions, the truncation function cannot use
MONTH
orYEAR
granularity.In the materialized view's partition specification, the granularity has to match the base table.
CREATE TABLE my_project.my_dataset.my_base_table ( employee_id INT64, transaction_time TIMESTAMP) PARTITION BY DATE(transaction_time); CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table PARTITION BY DATE(transaction_hour) AS ( SELECT employee_id, TIMESTAMP_TRUNC(transaction_time, HOUR) AS transaction_hour, COUNT(employee_id) AS cnt FROM my_dataset.my_base_table GROUP BY employee_id, transaction_hour );
Cluster materialized views
You can cluster materialized views by their output columns, subject to the BigQuery clustered table limitations. Aggregate output columns cannot be used as clustering columns. Adding clustering columns to materialized views can improve the performance of queries that include filters on those columns.
Reference logical views
To request feedback or support for this feature, send email to bq-mv-help@google.com.
Materialized view queries can reference logical views but are subject to the following limitations:
- Materialized view limitations apply.
- If the logical view changes, then the materialized view becomes invalid and must be fully refreshed.
- Smart tuning is not supported.
Considerations when creating materialized views
Which materialized views to create
When creating a materialized view, ensure your materialized view definition reflects query patterns against the base tables. Because there is a maximum of 20 materialized views per table, you should not create a materialized view for every permutation of a query. Instead, create materialized views to serve a broader set of queries.
For example, consider a query on a table where users often filter by the columns
user_id
or department
. You can group by these columns and optionally cluster
by them, instead of adding filters like user_id = 123
into the materialized
view.
As another example, users often use date filters, either by specific date, such
as WHERE order_date = CURRENT_DATE()
, or date range, such as WHERE order_date
BETWEEN '2019-10-01' AND '2019-10-31'
. Add a date range filter in
the materialized view that covers expected date ranges in the query:
CREATE MATERIALIZED VIEW ... ... WHERE date > '2019-01-01' GROUP BY date
Joins
The following recommendations apply to materialized views with JOINs.
Put the most frequently changing table first
Ensure that the largest or most frequently changing table is the first/leftmost table referenced in the view query. Materialized views with joins support incremental queries and refresh when the first or left-most table in the query is appended, but changes to other tables fully invalidate the view cache. In star or snowflake schemas the first or leftmost table should generally be the fact table.
Avoid joining on clustering keys
Materialized views with joins work best in cases where the data is heavily aggregated or the original join query is expensive. For selective queries, BigQuery is often already able to perform the join efficiently and no materialized view is needed. For example consider the following materialized view definitions.
CREATE MATERIALIZED VIEW dataset.mv CLUSTER BY s_market_id AS ( SELECT s_market_id, s_country, SUM(ss_net_paid) AS sum_sales, COUNT(*) AS cnt_sales FROM dataset.store_sales INNER JOIN dataset.store ON ss_store_sk = s_store_sk GROUP BY s_market_id, s_country );
Suppose store_sales
is clustered on ss_store_sk
and you often run queries
like the following:
SELECT SUM(ss_net_paid) FROM dataset.store_sales INNER JOIN dataset.store ON ss_store_sk = s_store_sk WHERE s_country = 'Germany';
The materialized view might not be as efficient as the original query. For best results, experiment with a representative set of queries, with and without the materialized view.
Use materialized views with max_staleness
option
The max_staleness
materialized view option helps you achieve consistently high
query performance with controlled costs when processing large, frequently
changing datasets. With the max_staleness
parameter, you can reduce cost and
latency on your queries by setting an interval of time where data staleness of
query results is acceptable. This behavior can be useful for dashboards and
reports for which fully up-to-date query results aren't essential.
Data staleness
When you query a materialized view with the max_staleness
option set,
BigQuery returns the result based on the max_staleness
value
and the time at which the last refresh occurred.
If the last refresh occurred within the max_staleness
interval, then
BigQuery returns data directly from the materialized view
without reading the base tables. For example, this applies if your
max_staleness
interval is 4 hours, and the last refresh occurred 2 hours ago.
If the last refresh occurred outside the max_staleness
interval, then
BigQuery reads the data from the materialized view, combines it
with changes to the base table since the last refresh, and returns the combined
result. This combined result might still be stale, up to your max_staleness
interval. For example, this applies if your max_staleness
interval is 4 hours,
and the last refresh occurred 7 hours ago.
Create with max_staleness
option
Select one of the following options:
SQL
To create a materialized view with the max_staleness
option, add an
OPTIONS
clause to the DDL statement when you create the materialized view:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE MATERIALIZED VIEW
project-id.my_dataset.my_mv_table
OPTIONS (enable_refresh = true, refresh_interval_minutes = 60, max_staleness = INTERVAL "4:0:0" HOUR TO SECOND) AS SELECT employee_id, DATE(transaction_time), COUNT(1) AS count FROMmy_dataset.my_base_table
GROUP BY 1, 2;Replace the following:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
Click
Run.
For more information about how to run queries, see Run an interactive query.
API
Call the tables.insert
method with a defined materializedView
resource as part of your API
request. The materializedView
resource contains a query
field. For
example:
{ "kind": "bigquery#table", "tableReference": { "projectId": "project-id", "datasetId": "my_dataset", "tableId": "my_mv_table" }, "materializedView": { "query": "select product_id,sum(clicks) as sum_clicks from project-id.my_dataset.my_base_table group by 1" } "maxStaleness": "4:0:0" }
Replace the following:
- project-id is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
product_id
is a column from the base table.clicks
is a column from the base table.sum_clicks
is a column in the materialized view that you are creating.
Apply max_staleness
option
You can apply this parameter to existing materialized views by using the ALTER
MATERIALIZED VIEW
statement. For example:
ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table SET OPTIONS (enable_refresh = true, refresh_interval_minutes = 120, max_staleness = INTERVAL "8:0:0" HOUR TO SECOND);
Query with max_staleness
You can query materialized views with the max_staleness
option as you would
query any other materialized view, logical view, or table.
For example:
SELECT * FROM project-id.my_dataset.my_mv_table
This query returns data from the last refresh if the data is not older than the
max_staleness
parameter. If the materialized view has not been refreshed
within the max_staleness
interval, BigQuery merges the results of the
latest available refresh with the base table changes to return results within
the max_staleness
interval.
Data streaming and max_staleness
results
If you stream data into the base tables of a materialized view with the
max_staleness
option, then the query of the materialized view might exclude
records that were streamed into its tables before the beginning of the staleness
interval. As a result, a materialized view that includes data from multiple
tables and max_staleness
option might not represent a point-in-time snapshot
of those tables.
Smart tuning and the max_staleness
option
Smart tuning automatically rewrites queries to use materialized views whenever
possible regardless of the max_staleness
option, even if the query does not
reference a materialized view. The max_staleness
option on a materialized view
does not affect the results of the rewritten query. The max_staleness
option
only affects queries that directly query the materialized view.
Manage staleness and refresh frequency
You should set max_staleness
based on your requirements. To avoid reading
data from base tables, configure the refresh interval so that the refresh takes
place within the staleness interval. You can account for the average refresh
runtime plus a margin for growth.
For example, if one hour is required to refresh your materialized view and you want a one-hour buffer for growth, then you should set the refresh interval to two hours. This configuration ensures that the refresh occurs within your report's four-hour maximum for staleness.
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table OPTIONS (enable_refresh = true, refresh_interval_minutes = 120, max_staleness = INTERVAL "4:0:0" HOUR TO SECOND) AS SELECT employee_id, DATE(transaction_time), COUNT(1) AS cnt FROM my_dataset.my_base_table GROUP BY 1, 2;
Non-incremental materialized views
Non-incremental materialized views support most SQL queries, including OUTER
JOIN
, UNION
, and HAVING
clauses, and analytic functions. To determine
whether a materialized view was used in your query, check the cost estimates
by using a dry run.
In scenarios where
data staleness is acceptable, for example for batch data processing or
reporting, non-incremental materialized views can improve query performance and
reduce cost. By using the max_staleness
option, you can build arbitrary,
complex materialized views that are automatically maintained and have built-in
staleness guarantees.
Use non-incremental materialized views
You can create non-incremental materialized views by using the
allow_non_incremental_definition
option. This option must be accompanied by
the max_staleness
option. To ensure a periodic refresh of the materialized
view, you should also configure a refresh
policy.
Without a refresh policy, you must manually refresh the materialized view.
The materialized view always represents the state of the base tables within the
max_staleness
interval. If the last refresh is too stale and doesn't represent
the base tables within the max_staleness
interval, then the query reads the
base tables. To learn more about possible performance implications, see Data
staleness.
Create with allow_non_incremental_definition
To create a materialized view with the allow_non_incremental_definition
option, follow these steps. After you create the materialized view, you cannot
modify the allow_non_incremental_definition
option. For example, you cannot
change the value true
to false
, or remove the
allow_non_incremental_definition
option from the materialized view.
SQL
Add an OPTIONS
clause to the DDL statement when you create the
materialized view:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table OPTIONS ( enable_refresh = true, refresh_interval_minutes = 60, max_staleness = INTERVAL "4" HOUR, allow_non_incremental_definition = true) AS
SELECT
s_store_sk, SUM(ss_net_paid) AS sum_sales, APPROX_QUANTILES(ss_net_paid, 2)[safe_offset(1)] median FROM my_project.my_dataset.store LEFT OUTER JOIN my_project.my_dataset.store_sales ON ss_store_sk = s_store_sk GROUP BY s_store_sk HAVING median < 40 OR median is NULL ;Replace the following:
- my_project is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_dataset.store and my_dataset.store_sales are the IDs of the tables in your dataset that serve as the base tables for your materialized view.
Click
Run.
For more information about how to run queries, see Run an interactive query.
API
Call the tables.insert
method with a defined materializedView
resource as part of your API
request. The materializedView
resource contains a query
field. For
example:
{ "kind": "bigquery#table", "tableReference": { "projectId": "my_project", "datasetId": "my_dataset", "tableId": "my_mv_table" }, "materializedView": { "query": "`SELECT` s_store_sk, SUM(ss_net_paid) AS sum_sales, APPROX_QUANTILES(ss_net_paid, 2)[safe_offset(1)] median FROM my_project.my_dataset.store LEFT OUTER JOIN my_project.my_dataset.store_sales ON ss_store_sk = s_store_sk GROUP BY s_store_sk HAVING median < 40 OR median is NULL`", "allowNonIncrementalDefinition": true } "maxStaleness": "4:0:0" }
Replace the following:
- my_project is your project ID.
- my_dataset is the ID of a dataset in your project.
- my_mv_table is the ID of the materialized view that you're creating.
- my_dataset.store and my_dataset.store_sales are the IDs of the tables in your dataset that serve as the base tables for your materialized view.
Query with allow_non_incremental_definition
You can query non-incremental materialized views as you would query any other materialized view, logical view, or table.
For example:
SELECT * FROM my_project.my_dataset.my_mv_table
If the data is not older than the max_staleness
parameter, then this query
returns data from the last refresh. For details about the staleness and
freshness of data, see data staleness.
Limitations specific to non-incremental materialized views
The following limitations only apply to materialized views with the
allow_non_incremental_definition
option. With the exception of limitations on
supported query syntax, all materialized view
limitations still apply.
- Smart-tuning is not applied to the materialized views that include the
allow_non_incremental_definition
option. The only way to benefit from materialized views with theallow_non_incremental_definition
option is to query them directly. - Materialized views without the
allow_non_incremental_definition
option can incrementally refresh a subset of their data. Materialized views with theallow_non_incremental_definition
option must be refreshed in their entirety. - Materialized views with max_staleness option validates presence of the column-level security constraints during query execution. See more details about this in column-level access control