When you create a cluster, standard Apache Hadoop ecosystem components are automatically installed on the cluster (see Dataproc Version List). You can install additional components, called "optional components", on the cluster when you create the cluster. Adding optional components to a cluster is similar to adding components through the use of initialization actions, but has the following advantages:
- Faster cluster startup times
- Tested compatibility with specific Dataproc versions
- Use of a cluster parameter instead of an initialization action script
Available optional components
Optional component | COMPONENT_NAME in gcloud commands and API requests |
Image Version | Release Stage |
---|---|---|---|
Docker | DOCKER | 1.5 and later | GA |
Flink | FLINK | 1.5 and later | GA |
HBase | HBASE | 1.5 and later (not available in 2.1 and later) |
Beta |
Hive WebHCat | HIVE_WEBHCAT | 1.3 and later | GA |
Hudi | Hudi | 1.5 and later | GA |
Jupyter Notebook | JUPYTER | 1.3 and later | GA |
Presto | PRESTO | 1.3 and later (not available in 2.1 and later) |
GA |
Ranger | RANGER | 1.3 and later | GA |
Solr | SOLR | 1.3 and later | GA |
Trino | TRINO | 2.1 and later | GA |
Zeppelin Notebook | ZEPPELIN | 1.3 and later | GA |
Zookeeper | ZOOKEEPER | 1.0 and later | GA |
Adding optional components
gcloud command
To create a Dataproc cluster and install one or more
optional components on the cluster, use the
gcloud beta dataproc clusters create cluster-name
command with the --optional-components
flag.
gcloud dataproc clusters create cluster-name \ --optional-components=COMPONENT-NAME(s) \ ... other flags
REST API
Optional components can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.Console
In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected. Under Optional components in the Components section, select one or more components to install on your cluster.