Cloudera Impala 3.1+ and Cloudera Impala with Native Driver

Dialects that use these instructions

Looker connects to the following Impala databases:

Cloudera Impala 3.1+
Cloudera Impala 3.1+ with Native Driver
Cloudera Impala with Native Driver

Encrypting network traffic

It is a best practice to encrypt network traffic between the Looker application and your database. Consider one of the options described on the Enabling secure database access documentation page.

Configuring Looker to connect to Cloudera Impala

Looker connects to databases through a JDBC connection. For Impala databases, Looker connects by default to the server that is running the impalad daemon on port 21050. For more information, see the Configuring Impala to work with JDBC section of the documentation on the Cloudera website.

In the Admin section of Looker, select Connections, and then click Add Connection.

The Looker connection configuration depends on the security that is being used:

A cluster not using Kerberos or user authentication
A cluster requiring LDAP authentication
A cluster secured with Kerberos, but not using Apache Sentry

Connecting to a cluster without Kerberos or user authentication

To configure a connection that isn't using Kerberos or user authentication, follow these steps:

On the Connection Settings page, leave the Username and Password fields blank. (The * next to the field names implies that these fields are required, but they are not.)
In the Additional JDBC parameters field, enter ;auth=noSasl.

Verifying the connection string

To verify the JDBC connection string in the log files, in the Looker Admin panel, click Log in the left menu. Then filter the log on a term such as jdbc or noSasl. The log line may look something like this:

jdbc connect using: jdbc:hive2://<HOSTNAME>/<DATABASE_NAME>;auth=noSasl

For more information about configuring Impala databases to work with JDBC, see the documentation on the external Cloudera website.

Connecting to a cluster requiring LDAP authentication

For a cluster that requires LDAP authentication, including a cluster with Apache Sentry and Kerberos, on the Connection Settings page, enter a Username and Password with access to the schemas Looker will access.

Connecting to a cluster secured with Kerberos, but not using Apache Sentry

The Looker analyst team may need to assist in configuring this connection correctly.

Usually, Kerberos authentication with Cloudera environments is handled through Apache Sentry. See the Cloudera documentation for more details.

If you want to configure Looker to connect directly to Impala databases using Kerberos authentication, follow the steps on this page.

Setting up the Kerberos client configuration

First, you need to ensure the installation of several pieces of software and the presence of several files on the Looker machine.

Kerberos client

Verify that the Kerberos client is installed on the Looker machine by trying to run kinit. If the Kerberos client is not installed, install the Kerberos client's binaries.

For example, on Redhat/CentOS, this would be:

sudo yum install krb5-workstation krb5-libs krb5-auth-dialog

Java 8

Java 8 must be installed on the Looker machine and in the PATH and JAVA_HOME of the Looker user. If necessary, install it locally in the looker directory.

Java Cryptography Extension

Download and install the Java Cryptography Extension (JCE) for Java 8 from the Oracle website.
- Locate the jre/lib/security directory for the Java installation.
- Remove the following JAR files from this directory: local_policy.jar and US_export_policy.jar.
- Replace these two files with the JAR files included in the JCE Unlimited Strength Jurisdiction Policy Files download.
It may be possible to use versions of Java prior to Java 8 with the JCE installed, but this is not recommended.
Update JAVA_HOME and PATH in ~looker/.bash_profile to point to the correct installation of Java and source ~/.bash_profile or log out and in again.
Verify the Java version with java -version.
Verify the JAVA_HOME environment variable with echo $JAVA_HOME.

`gss-jaas.conf`

Create a gss-jaas.conf file in the looker directory with these contents:

com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=true
    doNotPrompt=true;
};

If necessary for testing, debug=true can be added to this file like this:

com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=true
    doNotPrompt=true
    debug=true;
};

`krb5.conf`

The server that is running Looker should also have a valid krb5.conf file. By default, this file is in /etc/krb5.conf. If it is in another location, that must be indicated in the environment (KRB5_CONFIG in the shell environment).

You may need to copy this from another Kerberos client machine.

`lookerstart.cfg`

Point to the gss-jaas.conf and krb5.conf files by making a file in the looker directory (the same directory that contains the looker startup script) called lookerstart.cfg that contains the following lines:

  JAVAARGS="-Djava.security.auth.login.config=/path/to/gss-jaas.conf -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=/etc/krb5.conf"
  LOOKERARGS=""

If the krb5.conf file is not at /etc/krb5.conf then it will also be necessary to add this variable:

  -Djava.security.krb5.conf=/path/to/krb5.conf

For debugging, add these variables:

  -Dsun.security.jgss.debug=true -Dsun.security.krb5.debug=true

Then restart Looker with ./looker restart.

Authenticating with Kerberos

User authentication

If krb5.conf is not in /etc/, then use the environment variable KRB5_CONFIG to indicate its location.
Run the command klist to make sure there is a valid ticket in the Kerberos ticket cache.
If there is no ticket, run kinit username@REALM or kinit username to create the ticket.
The account that is used with Looker will likely be headless, so you can get a keytab file from Kerberos to store the credential for long-term use. Use a command like kinit -k -t looker_user.keytab username@REALM to get the Kerberos ticket.

Automatically renewing the ticket

Set up a cron job that runs every so often to keep an active ticket in the Kerberos ticket cache. How often this should run depends on the configuration of the cluster. klist should give an indication of how soon tickets expire.

Creating the Looker connection to your database

In the Admin section of Looker, select Connections, and then click Add Connection.

Fill out the connection details as follows (see the Connecting Looker to your database documentation page for more information):

Name: The name of the connection. This is how the connection will be referred to in the LookML model.
Dialect: Cloudera Impala 3.1+, Cloudera Impala 3.1+ with Native Driver, or Cloudera Impala with Native Driver.

Note: If you are on a Looker (Google Cloud core) instance and you don't see your dialect listed in the Dialect drop-down menu, see the Looker (Google Cloud core) documentation to verify that the dialect is supported for Looker (Google Cloud core) instances.
Host: Hostname.
Port: Database port (21050 by default).
Database: The default schema/database that will be modeled. When no database is specified for a table, this will be assumed.
Username: Leave this blank.
Password: Leave this blank.
Enable PDTs: Use this toggle to enable persistent derived tables. When PDTs are enabled, the Connection window reveals additional PDT settings and the PDT Overrides section.
Temp Database: A temporary schema/database for storing PDTs. This must be created beforehand.
Additional JDBC parameters: Additional parameters for the JDBC string. Indicate the Kerberos principal here, for example ;principal=impala/impala.company.com@REALM. Three-part principals are standard. The first (impala) is generally the name of the service, and the last (REALM) is generally the realm.
SSL: Check to use SSL connections. If your SSL certificate is not issued by a widely recognized Certificate Authority and you are using a custom certificate, you will need to:
- Copy the certificate file to the Looker server. This is only available for customer-hosted Looker deployments.
- Add the following parameters to the Additional JDBC parameters field:

  sslTrustStore=/path/to/your/trust_store.jks;trustStorePassword=yourpassword

See the Cloudera documentation for more details about how to form the correct JDBC strings for Impala databases.

Database Time Zone: The time zone of data stored in your database. Usually this can be left blank or set to UTC.

It is a best practice to have the server name (impala.company.com in this example) be the canonical name for the server and its IP address's reverse DNS lookup results in that name. However, the server name should be whatever is listed in the Kerberos domain controller:

  nslookup servername  # get canonical server name and IP address

  nslookup ipaddress  # get the canonical name back

Sometimes the server name is set to be the hostname, and not the fully qualified domain name. In this case it may be necessary to modify the /etc/hosts and /etc/nsswitch.conf files to make sure that reverse lookups resolve as intended.

Test the connection to make sure that it is configured correctly.

Debugging

Cloudera's documentation about debugging Impala authentication issues.
When you add debugging to the configuration, the extra debugging information will wind up in looker/logs/looker.log.

Resources

Permissions for PDTs

The user connecting to the scratch schema for persistent derived tables (PDTs) must have read/write permissions.

Feature support

For Looker to support some features, your database dialect must also support them.

Cloudera Impala with Native Driver

Cloudera Impala with Native Driver supports the following features as of Looker 25.12:

Feature	Supported?
Support level	Supported
Looker (Google Cloud core)	No
Symmetric aggregates	No
Derived tables	Yes
Persistent SQL derived tables	Yes
Persistent native derived tables	Yes
Stable views	Yes
Query killing	Yes
SQL-based pivots	Yes
Timezones	Yes
SSL	Yes
Subtotals	No
JDBC additional params	Yes
Case sensitive	Yes
Location type	Yes
List type	No
Percentile	No
Distinct percentile	No
SQL Runner Show Processes	No
SQL Runner Describe Table	Yes
SQL Runner Show Indexes	No
SQL Runner Select 10	Yes
SQL Runner Count	Yes
SQL Explain	Yes
OAuth 2.0 credentials	No
Context comments	Yes
Connection pooling	No
HLL sketches	No
Aggregate awareness	Yes
Incremental PDTs	No
Milliseconds	Yes
Microseconds	Yes
Materialized views	No
Period-over-period measures	No
Approximate count distinct	Yes