Dataproc components

When you create a cluster, standard Apache Hadoop ecosystem components are automatically installed on the cluster (see Dataproc Version List). You can install additional components, called "optional components", on the cluster when you create the cluster. Adding optional components to a cluster is similar to adding components through the use of initialization actions, but has the following advantages:

  • Faster cluster startup times
  • Tested compatibility with specific Dataproc versions
  • Use of a cluster parameter instead of an initialization action script

Available optional components

Optional component COMPONENT_NAME
in gcloud commands and API requests
Image Version Release Stage
Docker DOCKER 1.5 and later GA
Flink FLINK 1.5 and later GA
HBase HBASE 1.5 and later
(not available in 2.1 and later)
Beta
Hive WebHCat HIVE_WEBHCAT 1.3 and later GA
Hudi Hudi 1.5 and later GA
Jupyter Notebook JUPYTER 1.3 and later GA
Presto PRESTO 1.3 and later
(not available in 2.1 and later)
GA
Ranger RANGER 1.3 and later GA
Solr SOLR 1.3 and later GA
Trino TRINO 2.1 and later GA
Zeppelin Notebook ZEPPELIN 1.3 and later GA
Zookeeper ZOOKEEPER 1.0 and later GA

Adding optional components

gcloud command

To create a Dataproc cluster and install one or more optional components on the cluster, use the gcloud beta dataproc clusters create cluster-name command with the --optional-components flag.

gcloud dataproc clusters create cluster-name \
  --optional-components=COMPONENT-NAME(s) \
  ... other flags

REST API

Optional components can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.

Console

In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected. Under Optional components in the Components section, select one or more components to install on your cluster.