How Kerberos works with Dataproc Metastore

This page describes how Dataproc Metastore supports the Kerberos protocol.

Kerberos is a network authentication protocol that is designed to provide strong authentication for client and server applications by using secret-key cryptography. It's commonly used among the Hadoop stack for authentication throughout the software ecosystem.

You can configure Kerberos on the following Dataproc Metastore services:

A Dataproc Metastore service that uses the Thrift endpoint protocol.
A Dataproc Metastore service that uses the gRPC endpoint protocol.

The process for configuring Kerberos is different for each type of service.

Required Kerberos assets

The following section provides general information on the Kerberos assets that you need to configure Kerberos for a Dataproc Metastore service.

Kerberos KDC

A Kerberos KDC is required. You can use the local KDC of a Dataproc cluster or create and host your own.

Kerberos principal

When you configure Kerberos for a Dataproc Metastore service, you generate your principal file using a Dataproc cluster.

Keytab file

A keytab file contains pairs of Kerberos principals and encrypted keys, which are used to authenticate a service principal with a Kerberos KDC.

When you configure Kerberos for a Dataproc Metastore service, you generate your keytab file using a Dataproc cluster.

The generated keytab file contains the name and location of your Hive metastore service principal.
The generated keytab file is automatically stored in a Google Cloud Secret Manager.

The Secret Manager secret provided must be pinned to a specific secret version. You need to specify the secret version that you want to use, Dataproc Metastore does not pick the latest version automatically.

krb5.conf file

A valid krb5.conf file contains Kerberos configuration information, such as the KDC IP, port, and realm name.

When you configure Kerberos for a Dataproc Metastore service, you generate your keytab file using a Dataproc cluster.

When configuring the krb5.conf file, specify the KDC IP that is accessible from your peered network. Don't specify the KDC FQDN.
If you are using the Thrift endpoint, you must store the file in a Cloud Storage bucket. You can use an existing bucket or create a new one.

What's next

Create a Dataproc Metastore that uses the Thrift endpoint protocol.
Create a Dataproc Metastore that uses the gRPC endpoint protocol.