cluster_keys

Usage

view: view_name {
  derived_table: {
    cluster_keys: ["customer_city", "customer_state"]
    ...
  }
}
Hierarchy
cluster_keys

- or -

cluster_keys
Default Value
None

Accepts
One or more clustered column names

Special Rules
cluster_keys is supported only on specific dialects

Definition

Clustering a partitioned table sorts the data in a partition based on the values in the clustered columns and organizes the clustered columns in optimally sized storage blocks. Clustering can improve the performance and reduce the cost of queries that filter on or aggregate by the clustered columns.

See the Dialect support for cluster_keys section below for the list of dialects that support cluster_keys.

The cluster_keys parameter works only with tables that are persistent, such as PDTs and aggregate tables. cluster_keys is not supported for derived tables without a persistence strategy.

In addition, the cluster_keys parameter is not supported for derived tables that are defined using create_process or sql_create.

Lastly, Google BigQuery supports clustering on partitioned tables only. The cluster_keys parameter works only with PDTs or aggregate tables that are also partitioned using the partition_keys parameter.

To add a clustered column to a persistent derived table (PDT) or an aggregate table, use the cluster_keys parameter and supply the names of the columns you want clustered in the database table.

Examples

Create a customer_order_facts native derived table on a Google BigQuery database, partitioned on the date column and clustered on the city, age_tier, and gender columns to optimize queries that are filtered or aggregated on those columns:

view: customer_order_facts {
  derived_table: {
    explore_source: order {
      column: customer_id { field: order.customer_id }
      column: date { field: order.order_time }
      column: city { field: users.city}
      column: age_tier { field: users.age_tier }
      column: gender { field: users.gender }
      derived_column: num_orders {
        sql: COUNT(order.customer_id) ;;
      }
    }
    partition_keys: [ "date" ]
    cluster_keys: [ "city", "age_tier", "gender" ]
    datagroup_trigger: daily_datagroup
  }
}

Things to consider

Google BigQuery tables can partition only on date fields

Google BigQuery tables can be partitioned only on a date or timestamp column. If you want to add clustered columns to a PDT or aggregate table that does not include date or time-based data, one way to do that is to add a date column using a SQL statement such as SELECT CURRENT_DATETIME() as now, and then use partition_keys to partition on the new column. You can then use clustering on other columns in your PDT or aggregate table.

Dialect support for cluster_keys

The ability to use cluster_keys depends on the database dialect your Looker connection is using. In the latest release of Looker, the following dialects support cluster_keys:

Dialect Supported?
Actian Avalanche
No
Amazon Athena
No
Amazon Aurora MySQL
No
Amazon Redshift
No
Apache Druid
No
Apache Druid 0.13+
No
Apache Druid 0.18+
No
Apache Hive 2.3+
No
Apache Hive 3.1.2+
No
Apache Spark 3+
No
ClickHouse
No
Cloudera Impala 3.1+
No
Cloudera Impala 3.1+ with Native Driver
No
Cloudera Impala with Native Driver
No
DataVirtuality
No
Databricks
No
Denodo 7
No
Denodo 8
No
Dremio
No
Dremio 11+
No
Exasol
No
Firebolt
No
Google BigQuery Legacy SQL
Yes
Google BigQuery Standard SQL
Yes
Google Cloud PostgreSQL
No
Google Cloud SQL
No
Google Spanner
No
Greenplum
No
HyperSQL
No
IBM Netezza
No
MariaDB
No
Microsoft Azure PostgreSQL
No
Microsoft Azure SQL Database
No
Microsoft Azure Synapse Analytics
No
Microsoft SQL Server 2008+
No
Microsoft SQL Server 2012+
No
Microsoft SQL Server 2016
No
Microsoft SQL Server 2017+
No
MongoBI
No
MySQL
No
MySQL 8.0.12+
No
Oracle
No
Oracle ADWC
No
PostgreSQL 9.5+
No
PostgreSQL pre-9.5
No
PrestoDB
No
PrestoSQL
No
SAP HANA
No
SAP HANA 2+
No
SingleStore
No
SingleStore 7+
No
Snowflake
Yes
Teradata
No
Trino
No
Vector
No
Vertica
No