Configure Kerberos for Dataproc Metastore gRPC endpoints

This page explains how to configure Kerberos for your Dataproc Metastore service that uses the gRPC endpoint protocol. If your Dataproc Metastore service uses the Thrift endpoint protocol, see Configure Kerberos for Thrift endpoints.

Before you begin

Understand the basics of Kerberos.

In these instructions, you use a Dataproc cluster to create the following Kerberos assets:
- A Keytab file.
- A krb5.conf file.
- A Kerberos principal.
For more information about how these Kerberos assets work with a Dataproc Metastore service, see About Kerberos.
Create and host your own Kerberos KDC or learn how to use the local KDC of a Dataproc cluster.
Create a Cloud Storage bucket or get access to an existing one. You must store your krb5.conf file in this bucket.

Required Roles

To get the permission that you need to create a Dataproc Metastore configured with Kerberos , ask your administrator to grant you the following IAM roles on your project, based on the principle of least privilege:

Grant full control of Dataproc Metastore resources (roles/metastore.editor)
Grant full access to all Dataproc Metastore resources, including IAM policy administration (roles/metastore.admin)
Grant gRPC read-write access to Dataproc Metastore metadata (roles/metastore.metadataEditor)

For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the metastore.services.create permission, which is required to create a Dataproc Metastore configured with Kerberos .

You might also be able to get this permission with custom roles or other predefined roles.

For more information about specific Dataproc Metastore roles and permissions, see Manage access with IAM.

For more information, see Dataproc Metastore IAM and access control.

Configure Kerberos for Dataproc Metastore

The following instructions show you how to configure Kerberos for a Dataproc Metastore service that uses the gRPC endpoint.

First, you create a Dataproc Metastore that uses the gRPC endpoint. After, you create a Dataproc cluster configured with Kerberos and connect to it.

Create a Dataproc Metastore service with the gRPC endpoint

To create a Dataproc Metastore that uses the gRPC endpoint, run the following gcloud metastore services create command:

gcloud

gcloud metastore services create SERVICE \
     --instance-size=medium \
     --endpoint-protocol=grpc

Replace:

SERVICE: The name of your Dataproc Metastore service

Create a Dataproc cluster and connect to your service

To create a Dataproc configured with Kerberos, run the following gcloud dataproc clusters create command.

In this command, the --enable-kerberos option creates the Kerberos Keytab file, krb5.conf file, and principal. These values are all created using default names and settings set by the Dataproc cluster.

gcloud

gcloud dataproc clusters create CLUSTER_NAME \
    --project PROJECT_ID \
    --region REGION \
    --image-version 2.0-debian10 \
    --dataproc-metastore DATAPROC_METASTORE_NAME \
    --enable-kerberos \
    --scopes 'https://www.googleapis.com/auth/cloud-platform'

Replace:

CLUSTER_NAME: the name of your Dataproc cluster.
PROJECT_ID: Your Google Cloud project ID.
REGION: the Google Cloud region that you want to create your Dataproc cluster in.
DATAPROC_METASTORE_NAME: the name of the Dataproc Metastore service that you're attaching to the cluster, in the following format: projects/<my_project>/locations/<location>/services/<service_id>.

Configure Dataproc before submitting jobs

To run your Dataproc jobs, you must add the hive user to the allowed.system.users property in the Hadoop container-executor.cfg file. This lets users run queries to access data, such as select * from.

The following instructions show you how to SSH into your primary Dataproc cluster that's associated with your Dataproc Metastore service and update the container-executor.cfg file.

In the Google Cloud console go to the VM Instances page.
In the list of virtual machine instances, click SSH in the row of the Dataproc primary node (your-cluster-name-m).

A browser window opens in your home directory on the node.
In the ssh session, open the Hadoop container-executor.cfg file.
```
sudo vim /etc/hadoop/conf/container-executor.cfg
```
Add the following line on every Dataproc node.
```
allowed.system.users=hive
```

Get a kerberos ticket

The following instructions show you how to generate a Kerberos ticket.

In the Dataproc cluster ssh session, generate a kerberos ticket and connect to your Dataproc Metastore service.

This command uses the default keytab name generated by your Dataproc cluster.
```
sudo klist -kte /etc/security/keytab/hive.service.keytab
sudo kinit -kt /etc/security/keytab/hive.service.keytab hive/_HOST@${realm}
sudo klist # gets the ticket information.
```
The _HOST value is retrieved when the keytab file is listed using the klist -kte command. It contains the primary node's hostname in it.

(Optional) Add a new principal

To add a new principal, run the following command.

sudo kadmin.local -q "addprinc -randkey PRINCIPAL"
sudo kadmin.local -q "ktadd -k /etc/security/keytab/hive.service.keytab PRINCIPAL"

Get the kerberos ticket.

sudo klist -kte /etc/security/keytab/hive.service.keytab
sudo kinit -kt /etc/security/keytab/hive.service.keytab PRINCIPAL
sudo klist
sudo hive

Configure Kerberos for Dataproc Metastore gRPC endpoints

Before you begin

Required Roles

Configure Kerberos for Dataproc Metastore

Create a Dataproc Metastore service with the gRPC endpoint

gcloud

Create a Dataproc cluster and connect to your service

gcloud

Configure Dataproc before submitting jobs

Get a kerberos ticket

(Optional) Add a new principal

What's next