High Availability Mode

When creating a Google Cloud Dataproc cluster, you can put the cluster into Hadoop High Availability (HA) mode by specifying the number of master instances in the cluster. The number of masters can only be specified at cluster creation time.

Currently, Cloud Dataproc supports two master configurations:

  • 1 master (default, non HA)
  • 3 masters (Hadoop HA)

Differences between default and Hadoop High Availability mode

In the rare case of an unexpected Google Compute Engine failure, Cloud Dataproc instances will experience a machine reboot. The default single-master configuration for Cloud Dataproc is designed to recover and continue processing new work in such cases, but in-flight jobs will necessarily fail and need to be retried, and HDFS will be inaccessible until the single NameNode fully recovers on reboot.

In HA mode, HDFS High Availability and YARN High Availability are configured to allow uninterrupted YARN and HDFS operations despite any single-node failures/reboots.

Note that the driver/main program of any jobs you run still represents a potential single point of failure if the correctness of your job depends on the driver program running successfully. Jobs submitted through the Cloud Dataproc Jobs API are not considered "high availability," and will still be terminated on failure of the master node that runs the corresponding job driver programs. For individual jobs to be resilient against single-node failures using a HA Cloud Dataproc cluster, the job must either 1) run without a synchronous driver program or 2) it must run the driver program itself inside a YARN container and be written to handle driver-program restarts. See Launching Spark on YARN for an example of how restartable driver programs can run inside YARN containers for fault tolerance.

Instance Names

The default master is named cluster-name-m; HA masters are named cluster-name-m-0, cluster-name-m-1, cluster-name-m-2.

Apache ZooKeeper

In an HA Cloud Dataproc cluster, all masters participate in a ZooKeeper cluster, which enables automatic failover for other Hadoop services.


In a standard Cloud Dataproc cluster:

  • cluster-name-m runs:
    • NameNode
    • Secondary NameNode

In a High Availability Cloud Dataproc cluster:

  • cluster-name-m-0 and cluster-name-m-1 run:
    • NameNode
    • ZKFailoverController
  • All masters run JournalNode
  • There is no Secondary NameNode

Please see the HDFS High Availability documentation for additional details on components.


In a standard Cloud Dataproc cluster, cluster-name-m runs ResourceManager.

In a High Availability Cloud Dataproc cluster, all masters run ResourceManager.

Please see the YARN High Availability documentation for additional details on components.

Creating a High Availability cluster

gcloud command

To create an HA cluster with gcloud dataproc clusters create, run the following command:
gcloud dataproc clusters create cluster-name --num-masters 3


To create an HA cluster, use the clusters.create API, setting masterConfig.numInstances to 3.


To create an HA cluster, select "High Availability (3 masters, N workers)" from the Cluster mode selector on the Cloud Dataproc Create a cluster page.

Send feedback about...

Google Cloud Dataproc Documentation