Incremental PDTs

In Looker, persistent derived tables (PDTs) are written to the scratch schema of your database. Looker persists and rebuilds a PDT based on its persistence strategy. When a PDT is triggered to rebuild, by default Looker rebuilds the entire table.

An incremental PDT is a PDT that Looker builds by appending fresh data to the table instead of rebuilding the table in its entirety:

A large table with the three bottom rows highlighted to show a small number of new rows being added to the table.

If your dialect supports incremental PDTs, you can make the following types of PDTs into incremental PDTs:

The first time you run a query on an incremental PDT, Looker builds the entire PDT to get the initial data. If the table is large, the initial build may take a significant amount of time, as would building any large table. Once the initial table is built, subsequent builds will be incremental and will take less time, if the incremental PDT is set up strategically.

Note the following for incremental PDTs:

  • Incremental PDTs are supported only for PDTs that use a trigger-based persistence strategy (datagroup_trigger, sql_trigger_value, or interval_trigger). Incremental PDTs are not supported for PDTs that use the persist_for persistence strategy.
  • For SQL-based PDTs, the table query must be defined using the sql parameter to be used as an incremental PDT. SQL-based PDTs that are defined with the sql_create parameter or the create_process parameter cannot be incrementally built. As you can see in Example 1 on this page, Looker uses an INSERT or a MERGE command to create the increments for an incremental PDT. The derived table cannot be defined using custom Data Definition Language (DDL) statements, since Looker wouldn't be able to determine which DDL statements would be required to create an accurate increment.
  • The incremental PDT's source table must be optimized for time-based queries. Specifically, the time-based column that is used for the increment key must have an optimization strategy, such as partitioning, sortkeys, indexes, or whatever optimization strategy is supported for your dialect. Source table optimization is strongly recommended because each time the incremental table is updated, Looker queries the source table to determine the latest values of the time-based column that is used for the increment key. If the source table is not optimized for these queries, Looker's query for the latest values may be slow and expensive.

Defining an incremental PDT

You can use the following parameters to make a PDT into an incremental PDT:

  • increment_key (required to make the PDT an incremental PDT): Defines the time period for which new records should be queried.
  • {% incrementcondition %} Liquid filter (required to make a SQL-based PDT an incremental PDT; not applicable to LookML-based PDTs): Connects the increment key to the database time column that the increment key is based on. See the increment_key documentation page for more information.
  • increment_offset (optional): An integer that defines the number of previous time periods (at the increment key's granularity) that are rebuilt for each incremental build. The increment_offset parameter is useful in the case of late-arriving data, where previous time periods may have new data that wasn't included when the corresponding increment was originally built and appended to the PDT.

See the increment_key parameter documentation page for examples that show how to create incremental PDTs from persistent native derived tables, persistent SQL-based derived tables, and aggregate tables.

Here is a simple example of a view file that defines an incremental LookML-based PDT:

view: flights_lookml_incremental_pdt {
  derived_table: {
    indexes: ["id"]
    increment_key: "departure_date"
    increment_offset: 3
    datagroup_trigger: flights_default_datagroup
    distribution_style: all
    explore_source: flights {
      column: id {}
      column: carrier {}
      column: departure_date {}
    }
  }

  dimension: id {
    type: number
  }
  dimension: carrier {
    type: string
  }
   dimension: departure_date {
    type: date
  }
}

This table will build in its entirety the first time a query is run on it. After that, the PDT will be rebuilt in increments of one day (increment_key: departure_date), going back three days (increment_offset: 3).

The increment key is based on the departure_date dimension, which is actually the date timeframe from the departure dimension group. (See the dimension_group parameter documentation page for an overview of how dimension groups work.) The dimension group and timeframe are both defined in the flights view, which is the explore_source for this PDT. Here is how the departure dimension group is defined in the flights view file:

...
  dimension_group: departure {
    type: time
    timeframes: [
      raw,
      date,
      week,
      month,
      year
    ]
    sql: ${TABLE}.dep_time ;;
  }
...

Interaction of increment parameters and persistence strategy

A PDT's increment_key and increment_offset settings are independent of the PDT's persistence strategy:

  • The incremental PDT's persistence strategy determines only when the PDT increments. The PDT builder doesn't modify the incremental PDT unless the table's persistence strategy is triggered, or unless the PDT is manually triggered with the Rebuild Derived Tables & Run option in an Explore.
  • When the PDT increments, the PDT builder will determine when the latest data was previously added to the table, in terms of the most current time increment (the time period that is defined by the increment_key parameter). Based on that, the PDT builder will truncate the data to the beginning of the most recent time increment in the table, then build the latest increment from there.
  • If the PDT has an increment_offset parameter, the PDT builder will also rebuild the number of previous time periods specified in the increment_offset parameter. The previous time periods go back starting from the beginning of the most current time increment (the time period that is defined by the increment_key parameter).

The following example scenarios illustrate how incremental PDTs are updated, by showing the interaction of increment_key, increment_offset, and persistence strategy.

Example 1

This example uses a PDT with these properties:

  • Increment key: date
  • Increment offset: 3
  • Persistence strategy: triggered once a month on the first day of the month

Here is how this table will be updated:

  • A monthly persistence strategy means that the table is automatically built once a month. This means that on June 1st, for example, the last row in the table will have been added on May 1st.
  • Because this PDT has an increment key based on date, the PDT builder will truncate May 1st back to the beginning of the day and rebuild the data for May 1st and up to the current day, June 1st.
  • Additionally, this PDT has an increment offset of 3. So the PDT builder also rebuilds the data from the previous three time periods (days) before May 1st. The result is that data is rebuilt for April 28th, 29th, 30th, and up to the present day of June 1st.

In SQL terms, here is the command that the PDT builder will run on June 1st to determine the rows from the existing PDT that should be rebuilt:

## Example SQL for BigQuery:
SELECT FORMAT_TIMESTAMP('%F %T',TIMESTAMP_ADD(MAX(pdt_name),INTERVAL -3 DAY))

## Example SQL for other dialects:
SELECT CAST(DATE_ADD(MAX(pdt_name),INTERVAL -3 DAY) AS CHAR)

And here is the SQL command that the PDT builder will run on June 1st to build the latest increment:

## Example SQL for BigQuery:

MERGE INTO [pdt_name] USING (SELECT [columns]
   WHERE created_at >= TIMESTAMP('4/28/21 12:00:00 AM'))
   AS tmp_name ON FALSE
WHEN NOT MATCHED BY SOURCE AND created_date >= TIMESTAMP('4/28/21 12:00:00 AM')
   THEN DELETE
WHEN NOT MATCHED THEN INSERT [columns]

## Example SQL for other dialects:

START TRANSACTION;
DELETE FROM [pdt_name]
   WHERE created_date >= TIMESTAMP('4/28/21 12:00:00 AM');
INSERT INTO [pdt_name]
   SELECT [columns]
   FROM [source_table]
   WHERE created_at >= TIMESTAMP('4/28/21 12:00:00 AM');
COMMIT;

Example 2

This example uses a PDT with these properties:

  • Persistence strategy: triggered once a day
  • Increment key: month
  • Increment offset: 0

Here is how this table will be updated on June 1st:

  • The daily persistence strategy means that the table is automatically built once a day. On June 1st, the last row in the table will have been added on May 31st.
  • Because the increment key is based on the month, the PDT builder will truncate from May 31st back to the beginning of the month and rebuild the data for all of May and up to the current day, including June 1st.
  • Because this PDT has no increment offset, no previous time periods are rebuilt.

Here is how this table will be updated on June 2nd:

  • On June 2nd, the last row on the table will have been added on June 1st.
  • Because the PDT builder will truncate back to the beginning of the month of June, then rebuild the data starting with June 1st and up to the current day, the data is rebuilt for only June 1st and June 2nd.
  • Because this PDT has no increment offset, no previous time periods are rebuilt.

Example 3

This example uses a PDT with these properties:

  • Increment key: month
  • Increment offset: 3
  • Persistence strategy: triggered once a day

This scenario illustrates a poor setup for an incremental PDT, since it's a daily triggering PDT with a three-month offset. This means that at least three months of data will be rebuilt every day, which would be a very inefficient use of an incremental PDT. However, it is an interesting scenario to examine as a way of understanding of how incremental PDTs work.

Here is how this table will be updated on June 1st:

  • The daily persistence strategy means that the table is automatically built once a day. On June 1st, for example, the last row in the table will have been added on May 31st.
  • Because the increment key is based on the month, the PDT builder will truncate from May 31st back to the beginning of the month and rebuild the data for all of May and up to the current day, including June 1st.
  • Additionally, this PDT has an increment offset of 3. This means that the PDT builder also rebuilds the data from the previous three time periods (months) before May. The result is that data is rebuilt from February, March, April, and up to the current day, June 1st.

Here is how this table will be updated on June 2nd:

  • On June 2nd, the last row in the table will have been added on June 1st.
  • The PDT builder will truncate the month back to June 1st and rebuild the data for the month of June, including June 2nd.
  • In addition, because of the increment offset, the PDT builder will rebuild the data from the previous three months before June. The result is that data is rebuilt from March, April, May, and up to the current day, June 2nd.

Testing an incremental PDT in Development Mode

Before deploying a new incremental PDT to your production environment, you can test the PDT to be sure it builds and increments. To test an incremental PDT in Development Mode:

  1. Create an Explore for the PDT:

    • In an associated model file, use the include parameter to include the PDT's view file in the model file.
    • In the same model file, use the explore parameter to create an Explore for the incremental PDT's view.
     include: "/views/e_faa_pdt.view"
     explore: e_faa_pdt {}
    
  2. Open the Explore for the PDT. To do this, select the See file actions button and then select an Explore name.

  1. In the Explore, select some dimensions or measures and click Run. Looker will then build the entire PDT. If this is the first query you have run on the incremental PDT, the PDT builder will build the entire PDT to get the initial data. If the table is large, the initial build may take a significant amount of time, as would building any large table.

  2. You can verify that the initial PDT was built in the following ways:

    • If you have the see_logs permission, you can verify that the table was built by looking in the PDT Event Log. If you don't see the PDT create events in the PDT Event Log, check the status information at the top of the PDT Event Log Explore. If it says "from cache," you can select Clear Cache & Refresh to get more recent information.
    • Otherwise, you can look at the comments in the SQL tab of the Explore's Data bar. The SQL tab shows the query and the actions that will be taken when you run the query in the Explore. For example, if the comments in the SQL tab say -- generate derived table e_incremental_pdt, that is the action that will be taken when you click Run.
  3. Once you create the initial build of the PDT, prompt an incremental build of the PDT by using the Rebuild Derived Tables & Run option from the Explore.

  4. You can use the same methods as before to verify that the PDT builds incrementally:

    • If you have the see_logs permission, you can use the PDT Event Log to see create increment complete events for the incremental PDT. If you don't see this event in the PDT Event Log and the query status says "from cache," select Clear Cache & Refresh to get more recent information.
    • Look at the comments in the SQL tab of the Explore's Data bar. In this case, the comments will indicate that the PDT was incremented. For example: -- increment persistent derived table e_incremental_pdt to generation 2
  5. Once you've verified the PDT is built and incrementing correctly, if you don't want to keep the dedicated Explore for the PDT, you can remove or comment out the PDT's explore and include parameters from your model file.

After the PDT is built in Development Mode, the same table will be used for production once you deploy your changes, unless you make further changes to the table's definition. See the Persisted tables in Development Mode section of the Derived tables in Looker documentation page for more information.

Supported database dialects for incremental PDTs

For Looker to support incremental PDTs in your Looker project, your database dialect must support Data Definition Language (DDL) commands that enable deleting and inserting rows.

The following table shows which dialects support incremental PDTs in the latest release of Looker (for Databricks, Incremental PDTs are supported only on Databricks version 12.1 and higher):

Dialect Supported?
Actian Avalanche
No
Amazon Athena
No
Amazon Aurora MySQL
No
Amazon Redshift
Yes
Apache Druid
No
Apache Druid 0.13+
No
Apache Druid 0.18+
No
Apache Hive 2.3+
No
Apache Hive 3.1.2+
No
Apache Spark 3+
No
ClickHouse
No
Cloudera Impala 3.1+
No
Cloudera Impala 3.1+ with Native Driver
No
Cloudera Impala with Native Driver
No
DataVirtuality
No
Databricks
Yes
Denodo 7
No
Denodo 8
No
Dremio
No
Dremio 11+
No
Exasol
No
Firebolt
No
Google BigQuery Legacy SQL
No
Google BigQuery Standard SQL
Yes
Google Cloud PostgreSQL
Yes
Google Cloud SQL
No
Google Spanner
No
Greenplum
Yes
HyperSQL
No
IBM Netezza
No
MariaDB
No
Microsoft Azure PostgreSQL
Yes
Microsoft Azure SQL Database
No
Microsoft Azure Synapse Analytics
Yes
Microsoft SQL Server 2008+
No
Microsoft SQL Server 2012+
No
Microsoft SQL Server 2016
No
Microsoft SQL Server 2017+
No
MongoBI
No
MySQL
Yes
MySQL 8.0.12+
Yes
Oracle
No
Oracle ADWC
No
PostgreSQL 9.5+
Yes
PostgreSQL pre-9.5
Yes
PrestoDB
No
PrestoSQL
No
SAP HANA
No
SAP HANA 2+
No
SingleStore
No
SingleStore 7+
No
Snowflake
Yes
Teradata
No
Trino
No
Vector
No
Vertica
Yes