About Dataproc Metastore endpoint protocols

When you create a Dataproc Metastore service, you must choose to use one of the following endpoint protocols:

  • The Apache Thrift protocol
  • The gRPC protocol

This protocol defines how your Hive Metastore clients access metadata stored in your Dataproc Metastore service. This choice can also affect the features that you can integrate and use with your service.

This page explains the conceptual differences between each of the endpoint protocols.

Apache Thrift

The Apache Thrift protocol is the legacy default option that is preselected when you create a Dataproc Metastore service.

If you require Kerberos in your implementation, you should use this option. If you don't require Kerberos, consider using the gRPC protocol, which provides access to additional features.

If you use a Thrift endpoint, you can choose the port number that the Thrift interface connects to. By default, port number 9083 is used.

After choosing the Thrift protocol

After you create a Dataproc Metastore using Thrift, you can connect to it from a Dataproc cluster or self-managed cluster. Your cluster then uses Dataproc Metastore as its Hive metastore.

gRPC

The gRPC protocol is the modern, portable, high performance option that you must explicitly select when you create a Dataproc Metastore service.

If you choose the gRPC protocol, you can't update it to Thrift at a later date. If you want to move from gRPC to Thrift, you must create a new Dataproc Metastore.

If you use a gRPC endpoint, you can't choose the port number that the gRPC interface uses. Instead, port number 443 is automatically assigned to your interface.

After choosing the gRPC protocol

After you create a Dataproc Metastore using the gRPC endpoint protocol, you must grant additional IAM roles. After, you can connect to it from a Dataproc cluster. Your cluster then uses Dataproc Metastore as its Hive metastore.

What's next