Stay organized with collections Save and categorize content based on your preferences.

Create materialized views

This document describes how to create materialized views in BigQuery. Before you read this document, familiarize yourself with Introduction to materialized views.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To create materialized views, you need the bigquery.tables.create IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to create a materialized view:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information about BigQuery Identity and Access Management (IAM), see Access control with IAM.

Create materialized views

You can create BigQuery materialized views through the Google Cloud console, the bq command-line tool, or the BigQuery API.

For purposes of these examples, assume your base table is named my_base_table and has this schema:

Column name Type
product_id integer
clicks integer

Also assume that you want a materialized view that summarizes the number of clicks per product_id. The following steps create a materialized view named my_mv_table that has the following schema.

Column name Type
product_id integer
sum_clicks integer

To create a materialized view, select one of the following options:

SQL

Use the CREATE MATERIALIZED VIEW statement. The following example creates a materialized view for the number of clicks for each product ID:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE MATERIALIZED VIEW  myproject.mydataset.my_mv_table AS (
      SELECT
        product_id,
        SUM(clicks) AS sum_clicks
      FROM
        myproject.mydataset.my_base_table
      GROUP BY
        product_id
    );
    

  3. Click Run.

For more information about how to run queries, see Running interactive queries.

bq

Use the bq query command and supply the DDL statement as the query parameter.

bq query --use_legacy_sql=false '
CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
AS SELECT product_id, SUM(clicks) AS sum_clicks
FROM project-id.my_dataset.my_base_table
GROUP BY 1'

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're creating.
  • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  • product_id is a column from the base table.
  • clicks is a column from the base table.
  • sum_clicks is a column in the materialized view that you are creating.

Alternatively, you can use the bq mk command with the --materialized_view argument to create a materialized view. The following arguments work with the --materialized_view argument:

  • --enable_refresh: Specifies whether automatic refresh is enabled.
  • --refresh_interval_ms: Specifies the time, in milliseconds, for the maximum refresh frequency. For more information on the meaning of these parameters, see Refresh materialized views.

API

Call the tables.insert method with a defined materializedView resource as part of your API request. The materializedView resource contains a query field. For example:

{
  "kind": "bigquery#table",
  "tableReference": {
    "projectId": "project-id",
    "datasetId": "my_dataset",
    "tableId": "my_mv_table"
  },
  "materializedView": {
    "query": "select product_id,sum(clicks) as
                sum_clicks from project-id.my_dataset.my_base_table
                group by 1"
  }
}

where:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're creating.
  • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  • product_id is a column from the base table.
  • clicks is a column from the base table.
  • sum_clicks is a column in the materialized view that you are creating.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.MaterializedViewDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;

// Sample to create materialized view
public class CreateMaterializedView {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String materializedViewName = "MY_MATERIALIZED_VIEW_NAME";
    String query =
        String.format(
            "SELECT MAX(TimestampField) AS TimestampField, StringField, "
                + "MAX(BooleanField) AS BooleanField "
                + "FROM %s.%s GROUP BY StringField",
            datasetName, tableName);
    createMaterializedView(datasetName, materializedViewName, query);
  }

  public static void createMaterializedView(
      String datasetName, String materializedViewName, String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, materializedViewName);

      MaterializedViewDefinition materializedViewDefinition =
          MaterializedViewDefinition.newBuilder(query).build();

      bigquery.create(TableInfo.of(tableId, materializedViewDefinition));
      System.out.println("Materialized view created successfully");
    } catch (BigQueryException e) {
      System.out.println("Materialized view was not created. \n" + e.toString());
    }
  }
}

When the materialized view is successfully created, it appears in the Explorer panel of BigQuery in the Google Cloud console. Materialized views are listed on the Explorer panel within the Google Cloud console. As an example, the following shows how a materialized view's schema appears:

Materialized view schema in Google Cloud console

Unless you disable automatic refreshes, BigQuery starts an asynchronous full refresh for the materialized view. The query will finish quickly, but the initial refresh might continue to run.

Access control

You can grant access to a materialized view at the dataset level, the view level, or the column level. You can also set access at a higher level in the IAM resource hierarchy.

Querying a materialized view requires access to the view as well as its base tables. To share a materialized view, you can grant permissions to the base tables or configure a materialized view as an authorized view. For more information, see Authorized views.

To control access to views in BigQuery, see Authorized views.

Materialized views query support

Materialized views use a restricted SQL syntax. Queries must use the following pattern:

[ WITH cte [, …]]
SELECT  [{ ALL | DISTINCT }]
  expression [ [ AS ] alias ] [, ...]
FROM from_item [, ...]
[ WHERE bool_expression ]
[ GROUP BY expression [, ...] ]

from_item:
    {
      table_name [ as_alias ]
      | { join_operation | ( join_operation ) }
      | field_path
      | unnest_operator
      | cte_name [ as_alias ]
    }

as_alias:
    [ AS ] alias

Query limitations

Materialized views have the following limitations.

Aggregate requirements

Aggregates in the materialized view query must be outputs. Computing or filtering based on an aggregated value is not supported. For example, creating a view from the following query is not supported because it produces a value computed from an aggregate, COUNT(*) / 10 as cnt.

SELECT TIMESTAMP_TRUNC(ts, HOUR) AS ts_hour, COUNT(*) / 10 AS cnt
FROM mydataset.mytable
GROUP BY ts_hour;

Only the following aggregation functions are currently supported:

  • ANY_VALUE (but not over STRUCT)
  • APPROX_COUNT_DISTINCT
  • ARRAY_AGG (but not over ARRAY or STRUCT)
  • AVG
  • BIT_AND
  • BIT_OR
  • BIT_XOR
  • COUNT
  • COUNTIF
  • HLL_COUNT.INIT
  • LOGICAL_AND
  • LOGICAL_OR
  • MAX
  • MIN
  • SUM

Unsupported SQL features

The following SQL features are not supported in materialized views:

  • Left/right/full outer joins.
  • Self-joins (joins using the same table more than once).
  • Window functions.
  • ARRAY subqueries.
  • Non-deterministic functions such as RAND(), CURRENT_DATE(), SESSION_USER(), or CURRENT_TIME().
  • User-defined functions (UDFs).
  • TABLESAMPLE.
  • FOR SYSTEM_TIME AS OF.

Access control restrictions

  • If a user's query of a materialized view includes base table columns that they cannot access due to column-level security, then the query fails with the message Access Denied.
  • If a user queries a materialized view but doesn't have full access to all rows in the materialized views' base tables, then BigQuery runs the query against the base tables instead of reading materialized view data. This ensures the query respects all access control constraints. This limitation also applies when querying tables with data-masked columns.

WITH clause and common table expressions (CTEs)

Materialized views support WITH clauses and common table expressions. Materialized views with WITH clauses must still follow the pattern and limitations of materialized views without WITH clauses.

Examples

The following example shows a materialized view using a WITH clause:

WITH tmp AS (
  SELECT TIMESTAMP_TRUNC(ts, HOUR) AS ts_hour, *
  FROM mydataset.mytable
)
SELECT ts_hour, COUNT(*) AS cnt
FROM tmp
GROUP BY ts_hour;

The following example shows a materialized view using a WITH clause that is not supported because it contains two GROUP BY clauses:

WITH tmp AS (
  SELECT city, COUNT(*) AS population
  FROM mydataset.mytable
  GROUP BY city
)
SELECT population, COUNT(*) AS cnt
GROUP BY population;

Materialized views over BigLake tables

To create materialized views over BigLake tables, the BigLake table must have metadata caching enabled over Cloud Storage data and the materialized view must have a max_staleness option value greater than the base table. Materialized views over BigLake tables support the same set of queries as other materialized views.

Example

Creation of a simple aggregate view using a BigLake base table:

CREATE MATERIALIZED VIEW sample_dataset.sample_mv
    OPTIONS (max_staleness=INTERVAL "0:30:0" HOUR TO SECOND)
AS SELECT COUNT(*) cnt
FROM dataset.biglake_base_table;

For details about the limitations of materialized views over BigLake tables, see materialized views over BigLake tables.

Partitioned materialized views

Materialized views on partitioned tables can be partitioned. Partitioning a materialized view is similar to partitioning a normal table, in that it provides benefit when queries often access a subset of the partitions. In addition, partitioning a materialized view can improve the view's behavior when data in the base table or tables is modified or deleted. For more information, see Partition alignment.

If the base table is partitioned, then you can partition a materialized view on the same partitioning column. For time-based partitions, the granularity must match (hourly, daily, monthly, or yearly). For integer-range partitions, the range specification must exactly match. You cannot partition a materialized view over a non-partitioned base table.

If the base table is partitioned by ingestion time, then a materialized view can group by the _PARTITIONDATE column of the base table, and also partition by it. If you don't explicitly specify partitioning when you create the materialized view, then the materialized view is unpartitioned.

If the base table is partitioned, consider partitioning your materialized view as well to reduce refresh job maintenance cost and query cost.

Partition expiration

Partition expiration can't be set on materialized views. A materialized view implicitly inherits the partition expiration time from the base table. Materialized view partitions are aligned with the base table partitions, so they expire synchronously.

Example 1

In this example, the base table is partitioned on the transaction_time column with daily partitions. The materialized view is partitioned on the same column and clustered on the employee_id column.

CREATE TABLE my_project.my_dataset.my_base_table(
  employee_id INT64,
  transaction_time TIMESTAMP)
  PARTITION BY DATE(transaction_time)
  OPTIONS (partition_expiration_days = 2);

CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table
  PARTITION BY DATE(transaction_time)
  CLUSTER BY employee_id
AS (
  SELECT
    employee_id,
    transaction_time,
    COUNT(employee_id) AS cnt
  FROM
    my_dataset.my_base_table
  GROUP BY
    employee_id, transaction_time
);

Example 2

In this example, the base table is partitioned by ingestion time with daily partitions. The materialized view selects the ingestion time as a column named date. The materialized view is grouped by the date column and partitioned by the same column.

CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table
  PARTITION BY date
  CLUSTER BY employee_id
AS (
  SELECT
    employee_id,
    _PARTITIONDATE AS date,
    COUNT(1) AS count
  FROM
    my_dataset.my_base_table
  GROUP BY
    employee_id,
    date
);

Example 3

In this example, the base table is partitioned on a TIMESTAMP column named transaction_time, with daily partitions. The materialized view defines a column named transaction_hour, using the TIMESTAMP_TRUNC function to truncate the value to the nearest hour. The materialized view is grouped by transaction_hour and also partitioned by it.

Note the following:

  • The truncation function that is applied to the partitioning column must be at least as granular as the partitioning of the base table. For example, if the base table uses daily partitions, the truncation function cannot use MONTH or YEAR granularity.

  • In the materialized view's partition specification, the granularity has to match the base table.

CREATE TABLE my_project.my_dataset.my_base_table (
  employee_id INT64,
  transaction_time TIMESTAMP)
  PARTITION BY DATE(transaction_time);

CREATE MATERIALIZED VIEW my_project.my_dataset.my_mv_table
  PARTITION BY DATE(transaction_hour)
AS (
  SELECT
    employee_id,
    TIMESTAMP_TRUNC(transaction_time, HOUR) AS transaction_hour,
    COUNT(employee_id) AS cnt
  FROM
    my_dataset.my_base_table
  GROUP BY
    employee_id,
    transaction_hour
);

Cluster materialized views

You can cluster materialized views by their output columns, subject to the BigQuery clustered table limitations. Aggregate output columns cannot be used as clustering columns. Adding clustering columns to materialized views can improve the performance of queries that include filters on those columns.

Considerations when creating materialized views

Which materialized views to create

When creating a materialized view, ensure your materialized view definition reflects query patterns against the base tables. Because there is a maximum of 20 materialized views per table, you should not create a materialized view for every permutation of a query. Instead, create materialized views to serve a broader set of queries.

For example, consider a query on a table where users often filter by the columns user_id or department. You can group by these columns and optionally cluster by them, instead of adding filters like user_id = 123 into the materialized view.

As another example, users often use date filters, either by specific date, such as WHERE order_date = CURRENT_DATE(), or date range, such as WHERE order_date BETWEEN '2019-10-01' AND '2019-10-31'. Add a date range filter in the materialized view that covers expected date ranges in the query:

CREATE MATERIALIZED VIEW ...
  ...
  WHERE date > '2019-01-01'
  GROUP BY date

Joins

The following recommendations apply to materialized views with JOINs.

Put the most frequently changing table first

Ensure that the largest or most frequently changing table is the first/leftmost table referenced in the view query. Materialized views with joins support incremental queries and refresh when the first or left-most table in the query is appended, but changes to other tables fully invalidate the view cache. In star or snowflake schemas the first or leftmost table should generally be the fact table.

Avoid joining on clustering keys

Materialized views with joins work best in cases where the data is heavily aggregated or the original join query is expensive. For selective queries, BigQuery is often already able to perform the join efficiently and no materialized view is needed. For example consider the following materialized view definitions.

CREATE MATERIALIZED VIEW dataset.mv
  CLUSTER BY s_market_id
AS (
  SELECT
    s_market_id,
    s_country,
    SUM(ss_net_paid) AS sum_sales,
    COUNT(*) AS cnt_sales
  FROM dataset.store_sales
  INNER JOIN dataset.store
    ON ss_store_sk = s_store_sk
  GROUP BY s_market_id, s_country
);

Suppose store_sales is clustered on ss_store_sk and you often run queries like the following:

SELECT
  SUM(ss_net_paid)
FROM dataset.store_sales
INNER JOIN dataset.store
ON ss_store_sk = s_store_sk
WHERE s_country = 'Germany';

The materialized view might not be as efficient as the original query. For best results, experiment with a representative set of queries, with and without the materialized view.

Use materialized views with max_staleness option

The max_staleness materialized view option helps you achieve consistently high performance with controlled costs when processing large, frequently changing datasets. With the max_staleness parameter, you can adjust the freshness of the results to tune query performance. This behavior can be useful for dashboards and reports for which data freshness isn't essential.

Create materialized views with max_staleness option

When you create the materialized view, add an OPTION to the DDL statement:

Console

To create a materialized view in the Google Cloud console with the max_staleness option, add an OPTION to the DDL statement when you create the materialized view:

  1. Go to the BigQuery page.

    Go to BigQuery

  2. Click Compose new query.

  3. Enter your CREATE MATERIALIZED VIEW DDL statement into the Editor text area.

    CREATE MATERIALIZED VIEW  project-id.my_dataset.my_mv_table
    OPTIONS (enable_refresh = true, refresh_interval_minutes = 60,
    max_staleness = INTERVAL "4:0:0" HOUR TO SECOND)
    AS SELECT
      employee_id,
      DATE(transaction_time),
      COUNT(1) AS count
    FROM my_dataset.my_base_table
    GROUP BY 1, 2;
    

    Replace the following:

    • project-id is your project ID.
    • my_dataset is the ID of a dataset in your project.
    • my_mv_table is the ID of the materialized view that you're creating.
    • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  4. Click Run.

API

Call the tables.insert method with a defined materializedView resource as part of your API request. The materializedView resource contains a query field. For example:

{
  "kind": "bigquery#table",
  "tableReference": {
    "projectId": "project-id",
    "datasetId": "my_dataset",
    "tableId": "my_mv_table"
  },
  "materializedView": {
    "query": "select product_id,sum(clicks) as
                sum_clicks from project-id.my_dataset.my_base_table
                group by 1"
  }
  "maxStaleness": "4:0:0 HOUR TO SECOND"
  }
}

Replace the following:

  • project-id is your project ID.
  • my_dataset is the ID of a dataset in your project.
  • my_mv_table is the ID of the materialized view that you're creating.
  • my_base_table is the ID of a table in your dataset that serves as the base table for your materialized view.
  • product_id is a column from the base table.
  • clicks is a column from the base table.
  • sum_clicks is a column in the materialized view that you are creating.

Apply max_staleness option to materialized views

You can apply this parameter to existing materialized views by using the ALTER MATERIALIZED VIEW statement. For example:

ALTER MATERIALIZED VIEW project-id.my_dataset.my_mv_table
SET OPTIONS (enable_refresh=true, refresh_interval_minutes = 120, max_staleness
= INTERVAL "8:0:0" HOUR TO SECOND)

Query materialized views with max_staleness

You can query materialized views with the max_staleness option as you would query any other materialized view, logical view, or table.

For example:

SELECT * FROM  project-id.my_dataset.my_mv_table

This query returns data from the last refresh if the data is not older than the max_staleness parameter. If the materialized view has not been refreshed within the max_staleness interval, BigQuery merges the results of the latest available refresh with the base table changes to return results within the max_staleness interval.

Data staleness

When you query materialized views with max_staleness, BigQuery returns data consistent with the result of a materialized view query that was run within the max_staleness interval.

The query is run according to the following conditions:

  • If the last refresh is within the max_staleness interval, then BigQuery returns data directly from the materialized view without reading the base tables.

    Last refresh within the staleness interval.

  • If the last refresh is outside the max_staleness interval, then in addition to reading data from the materialized view, BigQuery also reads enough data from the base tables to return combined data that is within the staleness interval.

    Last refresh outside the staleness interval.

Data streaming and max_staleness results

If you stream data into the base table of a materialized view with the max_staleness option, then the query of the materialized view may exclude records that were streamed into the table before the start of the staleness interval.

Smart tuning and the max_staleness option

Smart tuning automatically rewrites queries to use materialized views whenever possible, even if the query does not reference a materialized view. The max_staleness option on a materialized view does not affect the results of the rewritten query. The max_staleness option only affects queries that directly query the materialized view.

Manage staleness and refresh frequency

You should set max_staleness based on your requirements. To avoid reading data from base tables, configure the refresh interval so that the refresh takes place within the staleness interval. You can account for the average refresh runtime plus a margin for growth.

For example, if one hour is required to refresh your materialized view and you want a one-hour buffer for growth, then you should set the refresh interval to two hours. This configuration ensures that the refresh occurs within your report's four-hour maximum for staleness.

CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
OPTIONS (enable_refresh = true, refresh_interval_minutes = 120, max_staleness =
INTERVAL "4:0:0" HOUR TO SECOND)
AS SELECT
  employee_id,
  DATE(transaction_time),
  COUNT(1) AS cnt
FROM my_dataset.my_base_table
GROUP BY 1, 2;

Next steps