Usage
view: view_name { measure: field_name { allow_approximate_optimization: yes } }
Hierarchy
allow_approximate_optimization |
Possible Field Types
Measure
Default Value
no
Accepts
A Boolean (yes or no)
|
Definition
For dialects that support HyperLogLog sketches, Looker can leverage the HyperLogLog algorithm to approximate distinct counts for aggregate tables.
The allow_approximate_optimization: yes
statement enables Looker to store HyperLogLog sketches in aggregate tables, which means that Looker can use approximations for distinct counts for aggregate awareness.
See the Dialect support for distinct counts with aggregate awareness section on this page for the list of dialects that support distinct counts for aggregate tables using HyperLogLog sketches.
In general, distinct counts can't be supported with aggregate awareness because you can't get accurate data if you try to aggregate distinct counts. For example, if you are counting the distinct users on a website, there may be a user who came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect.
One workaround for this is to create an aggregate table that exactly matches an Explore query, as described on the Aggregate awareness documentation page. When the Explore query and an aggregate table query are the same, distinct count measures do provide accurate data, so they can be used for aggregate awareness.
The other option is to use approximations for distinct counts. The HyperLogLog algorithm is known to have about a 2% potential error. The allow_approximate_optimization
parameter requires your Looker developers to acknowledge that it's okay to use approximate data for the measure so that the measure may be calculated approximately from aggregate tables.
With aggregate awareness, there are two cases where distinct counts come into play:
- The first case is with measures of
type: count_distinct
. - The second case is with measures of
type: count
that are actually being rendered by Looker ascount_distinct
measure types. As discussed on the Aggregate awareness documentation page, Looker renderscount
measures ascount_distinct
to avoid fanout miscalculations in Explores that join multiple database tables.
In both of these cases, if your dialect supports HyperLogLog sketches, you can add the allow_approximate_optimization: yes
statement to measures to enable approximate values. You can then include these measures in aggregate tables.
Even for measures defined with
allow_approximate_optimization: yes
, Looker will return exact data when possible. For example, if the dimensions in an Explore query are a perfect match of the dimensions in an aggregate table, Looker can provide exact data for distinct counts, without having to approximate. In this case, you will see in the Explore's SQL tab that distinct count measures are being used for aggregate awareness without employing the HyperLogLog algorithm.
Example
The apx_unique_count
measure shown in this example is set for allow_approximate_optimization: yes
, which means that the measure can be used in an aggregate_table
.
measure: apx_unique_count {
type: count_distinct
allow_approximate_optimization: yes # default value is no
sql: ${id} ;;
}
Dialect support for distinct counts with aggregate awareness
Looker can use distinct counts for aggregate awareness with database dialects that support HyperLogLog sketches. In the latest release of Looker, the following SQL dialects are supported for distinct counts with aggregate awareness:
Dialect | Supported? |
---|---|
Actian Avalanche | No |
Amazon Athena | Yes |
Amazon Aurora MySQL | No |
Amazon Redshift | Yes |
Apache Druid | No |
Apache Druid 0.13+ | No |
Apache Druid 0.18+ | No |
Apache Hive 2.3+ | No |
Apache Hive 3.1.2+ | No |
Apache Spark 3+ | No |
ClickHouse | No |
Cloudera Impala 3.1+ | No |
Cloudera Impala 3.1+ with Native Driver | No |
Cloudera Impala with Native Driver | No |
DataVirtuality | No |
Databricks | No |
Denodo 7 | No |
Denodo 8 | No |
Dremio | No |
Dremio 11+ | No |
Exasol | No |
Firebolt | No |
Google BigQuery Legacy SQL | No |
Google BigQuery Standard SQL | Yes |
Google Cloud PostgreSQL | No |
Google Cloud SQL | No |
Google Spanner | No |
Greenplum | No |
HyperSQL | No |
IBM Netezza | No |
MariaDB | No |
Microsoft Azure PostgreSQL | No |
Microsoft Azure SQL Database | No |
Microsoft Azure Synapse Analytics | No |
Microsoft SQL Server 2008+ | No |
Microsoft SQL Server 2012+ | No |
Microsoft SQL Server 2016 | No |
Microsoft SQL Server 2017+ | No |
MongoBI | No |
MySQL | No |
MySQL 8.0.12+ | No |
Oracle | No |
Oracle ADWC | No |
PostgreSQL 9.5+ | No |
PostgreSQL pre-9.5 | No |
PrestoDB | Yes |
PrestoSQL | Yes |
SAP HANA 2+ | No |
SingleStore | No |
SingleStore 7+ | No |
Snowflake | Yes |
Teradata | No |
Trino | Yes |
Vector | No |
Vertica | No |
Check your SQL dialect's documentation to understand the speed and accuracy tradeoffs of this method.