Looker connects to Apache Spark through a JDBC connection to the Spark Thrift Server.
Connecting Looker to Apache Spark
Configure a database connection via the Looker interface. From the Admin section, select Connections, and then click Add Connection. See the Connecting Looker to your database documentation page for more information.
Fill out the page as follows:
- Name: The name of the connection. This is how the connection will be referred to in the LookML model.
- Dialect: Select Apache Spark 1.5+, Apache Spark 2+, or Apache Spark 3+.
- Host:Port: The Thrift server host and port (10000 by default).
- Database: The default schema/database that will be modeled. When no database is specified for a table, this will be assumed.
- Username: The user that Looker will authenticate as.
- Password: The optional password for Looker user.
- Persistent Derived Tables: Check this if you will be using PDTs with Looker.
- Temp Database: A temporary schema/database for storing PDTs. It must be created beforehand, with a statement such as
CREATE SCHEMA looker_scratch;
. - Additional Params: Add any additional Hive JDBC parameters here, such as:
;spark.sql.inMemoryColumnarStorage.compressed=true
;auth=noSasl
- SSL: Leave this unchecked.
- Database Time Zone: The time zone of data stored in Spark. Usually it can be left blank or set to UTC.
- Query Time Zone: The time zone to display data queried in Looker.
Click Test These Settings to test the connection and make sure that it is configured correctly. If you see Can Connect, then press Add Connection. This runs the rest of the connection tests to verify that the service account was set up correctly and with the proper roles.
For more information about connection settings, see the Connecting Looker to your database documentation page.
Feature support
For Looker to support some features, your database dialect must also support them.
Apache Spark 1.5+
Apache Spark 1.5+ supports the following features as of Looker 23.4:
Feature | Supported? |
---|---|
Support Level | Integration |
Symmetric Aggregates | Yes |
Derived Tables | Yes |
Persistent SQL Derived Tables | Yes |
Persistent Native Derived Tables | Yes |
Stable Views | Yes |
Query Killing | Yes |
Pivots | Yes |
Timezones | Yes |
SSL | Yes |
Subtotals | Yes |
JDBC Additional Params | Yes |
Case Sensitive | Yes |
Location Type | Yes |
List Type | Yes |
Percentile | Yes |
Distinct Percentile | No |
SQL Runner Show Processes | No |
SQL Runner Describe Table | Yes |
SQL Runner Show Indexes | Yes |
SQL Runner Select 10 | Yes |
SQL Runner Count | Yes |
SQL Explain | Yes |
Oauth Credentials | No |
Context Comments | Yes |
Connection Pooling | No |
HLL Sketches | No |
Aggregate Awareness | Yes |
Incremental PDTs | No |
Milliseconds | Yes |
Microseconds | Yes |
Materialized Views | No |
Approximate Count Distinct | No |
Apache Spark 2.0
Apache Spark 2.0 supports the following features as of Looker 23.4:
Feature | Supported? |
---|---|
Support Level | Supported |
Symmetric Aggregates | Yes |
Derived Tables | Yes |
Persistent SQL Derived Tables | Yes |
Persistent Native Derived Tables | Yes |
Stable Views | Yes |
Query Killing | Yes |
Pivots | Yes |
Timezones | Yes |
SSL | Yes |
Subtotals | Yes |
JDBC Additional Params | Yes |
Case Sensitive | Yes |
Location Type | Yes |
List Type | Yes |
Percentile | Yes |
Distinct Percentile | No |
SQL Runner Show Processes | No |
SQL Runner Describe Table | Yes |
SQL Runner Show Indexes | No |
SQL Runner Select 10 | Yes |
SQL Runner Count | Yes |
SQL Explain | Yes |
Oauth Credentials | No |
Context Comments | Yes |
Connection Pooling | No |
HLL Sketches | No |
Aggregate Awareness | Yes |
Incremental PDTs | No |
Milliseconds | Yes |
Microseconds | Yes |
Materialized Views | No |
Approximate Count Distinct | No |
Apache Spark 3+
Apache Spark 3+ supports the following features as of Looker 23.4:
Feature | Supported? |
---|---|
Support Level | Supported |
Symmetric Aggregates | Yes |
Derived Tables | Yes |
Persistent SQL Derived Tables | Yes |
Persistent Native Derived Tables | Yes |
Stable Views | Yes |
Query Killing | Yes |
Pivots | Yes |
Timezones | Yes |
SSL | Yes |
Subtotals | Yes |
JDBC Additional Params | Yes |
Case Sensitive | Yes |
Location Type | Yes |
List Type | Yes |
Percentile | Yes |
Distinct Percentile | No |
SQL Runner Show Processes | No |
SQL Runner Describe Table | Yes |
SQL Runner Show Indexes | No |
SQL Runner Select 10 | Yes |
SQL Runner Count | Yes |
SQL Explain | Yes |
Oauth Credentials | No |
Context Comments | Yes |
Connection Pooling | No |
HLL Sketches | No |
Aggregate Awareness | Yes |
Incremental PDTs | No |
Milliseconds | Yes |
Microseconds | Yes |
Materialized Views | No |
Approximate Count Distinct | No |
Next steps
After you have created the connection, set authentication options.