Dataproc Druid Component

You can install additional components when you create a Dataproc cluster using the Optional Components feature. This page describes the Druid component.

The Apache Druid component is an open source distributed OLAP data store. The Druid component installs Druid services on the Dataproc cluster master (Coordinator, Broker, and Overlord) and worker (Historical, Realtime and MiddleManager) nodes. The Druid component use Zookeeper to manage coordination.

Install the component

Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later. The Druid component requires the installation of the Zookeeper component (as shown in the gcloud command-line tool example, below).

See Supported Dataproc versions for the component version included in each Dataproc image release.

gcloud command

To create a Dataproc cluster that includes the Druid component, use the gcloud dataproc beta clusters create cluster-name command with the --optional-components flag.

gcloud beta dataproc clusters create cluster-name \
    --optional-components=DRUID,ZOOKEEPER \
    --region=region \
    ... other flags

REST API

The Druid component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

  1. Enable the component.
    • In the Cloud Console, open the Dataproc Create a cluster page. Click "Advanced options" at the bottom of the page to view the Optional Components section.

    • Click "Select component" to open the Optional components selection panel. Select "Druid" and other optional components to install on your cluster.

Accessing Druid

A Dataproc cluster created with the Druid component installed has the Druid Overlord, Coordinator and Broker services configured on the cluster's master node. These services are hosted on following master node ports:

Service Port
Overlord 8092
Coordinator 8081
Broker 8082

A Dataproc cluster created with the Druid component installed has the Druid Middlemanager and Historical services configured on worker nodes. These services are hosted on following worker node ports:

Service Port
Middlemanager 8091
Historical 8083

Currently, Druid is not integrated with Dataproc Component Gateway. To access the Druid Coordinator or Overlord Web UIs, create an SSH tunnel to the port for the service on the master node.