Single node clusters

Single node clusters are Cloud Dataproc clusters with only one node. This single node acts as the master and worker for your Cloud Dataproc cluster. While single node clusters only have one node, most Cloud Dataproc concepts and features still apply, except those listed below.

There are a number of situations where single node Cloud Dataproc clusters can be useful, including:

  • Trying out new versions of Spark and Hadoop or other open source components
  • Building proof-of-concept (PoC) demonstrations
  • Lightweight data science
  • Small-scale non-critical data processing
  • Education related to the Spark and Hadoop ecosystem

Single node cluster semantics

The following semantics apply to single node Cloud Dataproc clusters:

  • Single node clusters are configured the same as multi node Cloud Dataproc clusters, and include services such as HDFS and YARN.
  • Single node clusters report as master nodes for initialization actions.
  • Single node clusters show 0 workers since the single node acts as both master and worker.
  • Single node clusters are given hostnames that follow the pattern clustername-m. You can use this hostname to SSH into or connect to a web UI on the node.
  • Single node clusters cannot be upgraded to multi node clusters. Once created, single node clusters are restricted to one node. Similarly, multi node clusters cannot be scaled down to single node clusters.

Limitations

  • Single node clusters are not recommended for large-scale parallel data processing. If you exceed the resources on a single node cluster, a multi node Cloud Dataproc cluster is recommended.
  • n1-standard-1 machine types have limited resources and are not recommended for YARN applications.
  • Single node clusters are not available with high-availability since there is only one node in the cluster.
  • Single node clusters cannot use preemptible VMs.

Creating a single node cluster

gcloud command

You can create a single node Cloud Dataproc cluster using the gcloud command-line tool. To create a single node cluster, pass the --single-node flag to the gcloud dataproc clusters create command.

gcloud dataproc clusters create args --single-node

REST API

You can create a single node cluster through the Cloud Dataproc REST API using a clusters.create request. When making this request, you must:

  1. Add the property dataproc:dataproc.allow.zero.workers="true" to the SoftwareConfig of the cluster request.
  2. Don't submit values for workerConfig and secondaryWorkerConfig (see ClusterConfig).

Console

You can create a single node cluster by selecting "Single Node (1 master, 0 workers)" from the Cluster mode selector on the Cloud Dataproc Create a cluster page.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation