Looker cookbooks: Getting the most out of derived tables in Looker

Derived tables open up a world of advanced analytical possibilities, but they can be daunting to approach, implement, and troubleshoot. This cookbook contains the most popular use cases of derived tables in Looker.

This page contains the following examples:

Building a table at 3 AM daily
Appending new data to a large table
Using SQL window functions
Creating derived columns for calculated values
Optimization strategies
Using PDTs to test optimizations
UNION two tables
Taking a sum of a sum (dimensionalizing a measure)
Rollup tables with aggregate awareness

Derived table resources

These cookbooks assume you have an introductory understanding of LookML and derived tables. You should be comfortable making views and editing the model file. If you want a refresher on any of these topics, check out the following resources:

Building a table at 3 AM daily

Data in this example comes in at 2 AM every day. The results of a query on this data will be the same whether it's run at 3 AM or at 9 PM. Therefore, it makes sense to build the table once a day and let users pull results from a cache.

Including your datagroup in the model file lets you reuse it with multiple tables and Explores. This datagroup contains a sql_trigger_value parameter that tells the datagroup when to trigger and rebuilt the derived table.

For more examples of trigger expressions, see the sql_trigger_value documentation.


## in the model file

datagroup: standard_data_load {
  sql_trigger_value: SELECT FLOOR(((TIMESTAMP_DIFF(CURRENT_TIMESTAMP(),'1970-01-01 00:00:00',SECOND)) - 60*60*3)/(60*60*24)) ;;
  max_cache_age: "24 hours"
}

explore: orders {
…

Add the datagroup_trigger parameter to the derived_table definition in the view file, and specify the name of the datagroup that you want to use. In this example, the datagroup is standard_data_load.


view: orders {
 derived_table: {
  indexes: ["id"]
  datagroup_trigger: standard_data_load
    sql:
      SELECT
        user_id,
        id,
        created_at,
        status
      FROM
        demo_db.orders
      GROUP BY
        user_id ;;
    }

…
}

Appending new data to a large table

An incremental PDT is a persistent derived table that Looker builds by appending fresh data to the table, instead of rebuilding the table in its entirety.

The next example builds on the orders table example to show how the table builds incrementally. New order data comes in every day and can be appended to the existing table when you add an increment_key parameter and increment_offset parameter.


view: orders {
 derived_table: {
    indexes: ["id"]
    increment_key: "created_at"
    increment_offset: 3
    datagroup_trigger: standard_data_load
    distribution_style: all
    sql:
      SELECT
        user_id,
        id,
        created_at,
        status
      FROM
        demo_db.orders
      GROUP BY
        user_id ;;
    }

  dimension: id {
    primary_key: yes
    type: number
    sql: ${TABLE}.id ;;  }

…
}

The increment_key value is set to created_at, which is the time increment for which fresh data should be queried and appended to the PDT in this example.

The increment_offset value is set to 3 to specify the number of previous time periods (at the increment key's granularity) that are rebuilt to account for late-arriving data.

Using SQL window functions

Some database dialects support window functions, especially to create sequence numbers, primary keys, running and cumulative totals, and other useful multi-row calculations. After the primary query has been executed, any derived_column declarations are executed in a separate pass.

If your database dialect supports window functions, you can use them in your native derived table. Create a derived_column parameter with a sql parameter that contains your window function. When referring to values, you should use the column name as defined in your native derived table.

The following example shows how to create a native derived table that includes the user_id, order_id, and created_time columns. You would then use a derived column with a SQL ROW_NUMBER() window function to calculate a column that contains the sequence number of a customer's order.

view: user_order_sequences {
  derived_table: {
    explore_source: order_items {
      column: user_id {
        field: order_items.user_id
      }
      column: order_id {
        field: order_items.order_id
      }
      column: created_time {
        field: order_items.created_time
      }
      derived_column: user_sequence {
        sql: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_time) ;;
      }
    }
  }
  dimension: order_id {
    hidden: yes
  }
  dimension: user_sequence {
    type: number
  }
}

Creating derived columns for calculated values

You can add derived_column parameters to specify columns that don't exist in the explore_source parameter's Explore. Each derived_column parameter has a sql parameter that specifies how to construct the value.

Your sql calculation can use any columns that you have specified by using column parameters. Derived columns cannot include aggregate functions, but they can include calculations that can be performed on a single row of the table.

This example creates an average_customer_order column, which is calculated from the lifetime_customer_value and lifetime_number_of_orders columns in the native derived table.

view: user_order_facts {
  derived_table: {
    explore_source: order_items {
      column: user_id {
        field: users.id
      }
      column: lifetime_number_of_orders {
        field: order_items.count
      }
      column: lifetime_customer_value {
        field: order_items.total_profit
      }
      derived_column: average_customer_order {
        sql:  lifetime_customer_value / lifetime_number_of_orders ;;
      }
    }
  }

  dimension: user_id {
    hidden: yes
  }
  dimension: lifetime_number_of_orders {
    type: number
  }
  dimension: lifetime_customer_value {
    type: number
  }
  dimension: average_customer_order {
    type: number
  }
}

Optimization strategies

Because PDTs are stored in your database, you should optimize your PDTs by using the following strategies, as supported by your dialect:

For example, to add persistence, you could set the PDT to rebuild when the datagroup orders_datagroup triggers, and then you could add indexes on both customer_id and first_order, as shown next:

view: customer_order_summary {
  derived_table: {
    explore_source: orders {
      ...
    }
    datagroup_trigger: orders_datagroup
    indexes: ["customer_id", "first_order"]
  }
}

If you don't add an index (or an equivalent for your dialect), Looker will warn you that you should do so to improve query performance.

Using PDTs to test optimizations

You can use PDTs to test different indexing, distributions, and other optimization options without needing a large amount of support from your DBA or ETL developers.

Consider a case where you have a table but want to test different indexes. Your initial LookML for the view may look like the following:

view: customer {
  sql_table_name: warehouse.customer ;;
}

To test optimization strategies, you can use the indexes parameter to add indexes to the LookML, as shown next:

view: customer {
  # sql_table_name: warehouse.customer
  derived_table: {
    sql: SELECT * FROM warehouse.customer ;;
    persist_for: "8 hours"
    indexes: [customer_id, customer_name, salesperson_id]
  }
}

Query the view once to generate the PDT. Then run your test queries and compare your results. If your results are favorable, you can ask your DBA or ETL team to add the indexes to the original table.

`UNION` two tables

You can execute a SQL UNION or UNION ALL operator in both derived tables if your SQL dialect supports it. The UNION and UNION ALL operators combine the result sets of two queries.

This example shows what a SQL-based derived table looks like with a UNION:

view: first_and_second_quarter_sales {
  derived_table: {
    sql:
       SELECT * AS sales_records
       FROM sales_records_first_quarter
       UNION
       SELECT * AS sales_records
       FROM sales_records_second_quarter ;;
   }
}

The UNION statement in the sql parameter produces a derived table that combines the results of both queries.

The difference between UNION and UNION ALL is that UNION ALL does not remove duplicate rows. There are performance considerations to keep in mind when using UNION versus UNION ALL, as the database server must do additional work to remove the duplicate rows.

Taking a sum of a sum (dimensionalizing a measure)

As a general rule in SQL — and, by extension, Looker — you cannot group a query by the results of an aggregate function (represented in Looker as measures). You can only group by unaggregated fields (represented in Looker as dimensions).

To group by an aggregate (to take the sum of a sum, for example), you need to "dimensionalize" a measure. One way to do this is by using a derived table, which effectively creates a subquery of the aggregate.

Starting with an Explore, Looker can generate LookML for all or most of your derived table. Just create an Explore and select all the fields that you want to include in your derived table. Then, to generate the native (or SQL-based) derived table LookML, follow these steps:

Click the Explore's gear menu and select Get LookML.
To see the LookML for creating a native derived table for the Explore, click the Derived Table tab.
Copy the LookML.

Now that you have copied the generated LookML, paste it into a view file by following these steps:

In Development Mode, navigate to your project files.
Click the + at the top of the project file list in the Looker IDE, and select Create View. Alternatively, to create the file inside the folder, click a folder's menu and select Create View.
Set the view name to something meaningful.
Optionally, change column names, specify derived columns, and add filters.

Rollup tables with aggregate awareness

In Looker, you may often encounter very large datasets or tables that, in order to be performant, require aggregation tables or rollups.

With Looker's aggregate awareness, you can pre-construct aggregate tables to various levels of granularity, dimensionality, and aggregation; and you can inform Looker of how to use them within existing Explores. Queries will then use these rollup tables where Looker deems appropriate, without any user input. This will cut down query size, reduce wait times, and enhance user experience.

The following shows a very simple implementation in a Looker model to demonstrate how lightweight aggregate awareness can be. Given a hypothetical flights table in the database with a row for every flight that is recorded through the FAA, you can model this table in Looker with its own view and Explore. Following is the LookML for an aggregate table that you can define for the Explore:

  explore: flights {
    aggregate_table: flights_by_week_and_carrier {
      query: {
        dimensions: [carrier, depart_week]
        measures: [cancelled_count, count]
      }

      materialization: {
        sql_trigger_value: SELECT CURRENT-DATE;;
      }
    }
  }

With this aggregate table, a user can query the flights Explore, and Looker will automatically use the aggregate table to answer queries. For a more detailed walkthrough of aggregate awareness, visit the Aggregate awareness tutorial.