Usage
view: my_view { derived_table: { distribution: "customer_id" ... } }
Hierarchy
distribution - or - distribution |
Default Value
None
Accepts
The name of a column in a derived table or aggregate table
Special Rules
distribution is supported only for Redshift and Aster databases.
|
Definition
distribution
lets you specify the column from a persistent derived table (PDT) or an aggregate table on which to apply a distribution key to spread the data around a cluster. When two tables are joined by the column specified in the distribution
parameter, the database can find the joined data on the same node to minimize internode I/O. Currently, distribution
works only with Redshift and Aster databases. For other SQL dialects (such as MySQL and Postgres), use indexes
instead.
The
distribution
parameter works only with tables that are persistent, such as PDTs and aggregate tables.distribution
is not supported for derived tables without a persistence strategy.In addition, the
distribution
parameter is not supported for derived tables that are defined usingcreate_process
orsql_create
.
Generally speaking, a distribution key should be applied to the column that will be acting as a foreign key (the column you use to join your table to others). Refer to the documentation for your dialect for best practices.
Examples
Create a customer_order_facts
persistent derived table with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
explore_source: order {
column: customer_id { field: order.customer_id }
column: lifetime_orders { field: order.lifetime_orders }
}
datagroup_trigger: order_datagroup
distribution: "customer_id"
}
}
Create a customer_order_facts
derived table based on a SQL query with a distribution key on customer_id
:
view: customer_order_facts {
derived_table: {
sql:
SELECT
customer_id,
COUNT(*) AS lifetime_orders
FROM
order ;;
persist_for: "24 hours"
distribution: "customer_id"
}
}