Cluster properties

The open source components installed on Cloud Dataproc clusters contain many configuration files. For example, Apache Spark and Apache Hadoop have several XML and plain text configuration files. From time to time, you may need to update or add to these configuration files. You can use the ‑‑properties flag of the gcloud dataproc clusters create command in the Cloud SDK to modify many common configuration files when creating a cluster.

How the properties flag works

To make updating files and properties easy, the gcloud dataproc clusters create --properties flag uses a special format to specify the configuration file, and the property and value within the file, that should be updated.

Formatting

The --properties flag requires a string of text in the following format:

file_prefix:property=value

The --properties flag can only modify a specific set of commonly used configuration files. The file_prefix maps to a predefined set of configuration files.

file_prefix File Purpose of file
capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration
core core-site.xml Hadoop general configuration
distcp distcp-default.xml Hadoop Distributed Copy configuration
hadoop-env hadoop-env.sh Hadoop specific environment variables
hdfs hdfs-site.xml Hadoop HDFS configuration
hive hive-site.xml Hive configuration
mapred mapred-site.xml Hadoop MapReduce configuration
mapred-env mapred-env.sh Hadoop MapReduce specific environment variables
pig pig.properties Pig configuration
spark spark-defaults.conf Spark configuration
spark-env spark-env.sh Spark specific environment variables
yarn yarn-site.xml Hadoop YARN configuration
yarn-env yarn-env.sh Hadoop YARN specific environment variables

Important notes

  • Some properties are reserved and cannot be overridden because they impact the functionality of the Cloud Dataproc cluster. If you try to change a reserved property, you will receive an error message when creating your cluster.
  • You can specify multiple changes by separating each with a comma.
  • The --properties flag cannot modify configuration files not shown above.
  • Changing properties when creating clusters in the Google Cloud Platform Console is currently not supported.
  • Changes to properties will be applied before the daemons on your cluster start.
  • If the specified property exists, it will be updated. If the specified property does not exist, it will be added to the configuration file.

Cloud Dataproc service properties

These are additional properties specific to Cloud Dataproc that are not included in the files listed above. These properties can be used to further configure the functionality of your Cloud Dataproc cluster.

Property Values Function
dataproc:dataproc.logging.stackdriver.enable true or false Enables (true) or disables (false) logging to Stackdriver.
dataproc:dataproc.monitoring.stackdriver.enable true or false Enables (true) or disables (false) the Stackdriver Monitoring Agent.
dataproc:dataproc.localssd.mount.enable true or false Whether to mount local SSDs as Hadoop/Spark temp directories and HDFS data directories (default: true).
dataproc:dataproc.allow.zero.workers true or false Set this SoftwareConfig property to true in a Cloud Dataproc clusters.create API request to create a Single node cluster, which changes default number of workers from 2 to 0, and places worker components on the master host. A Single node cluster can also be created from the GCP Console or with the gcloud command-line tool by setting the number of workers to 0.
dataproc:dataproc.conscrypt.provider.enable true or false Enables (true) or disables (false) Conscrypt as the primary Java security provider. Note: Conscrypt is enabled by default in Dataproc 1.2 and higher, but disabled in 1.0/1.1.

Examples

gcloud command

To change the spark.mastersetting in the spark-defaults.conf file, you can do so by adding the following properties flag when creating a new cluster on the command line:
--properties 'spark:spark.master=spark://example.com'
You can change several properties at once, in one or more configuration files, by using a comma separator. Each property must be specified in the full file_prefix:property=value format. For example, to change the spark.master setting in the spark-defaults.conf file and the dfs.hosts setting in the hdfs-site.xml file, you can use the following flag when creating a cluster:
--properties 'spark:spark.master=spark://example.com,hdfs:dfs.hosts=/foo/bar/baz'

REST API

To set spark.executor.memory to 10gb, insert the following in the body of your cluster create JSON request:
'properties': {
  'spark:spark.executor.memory': '10gb'
}

Console

Currently, adding cluster properties from the Cloud Dataproc Create a cluster GCP Console page is not supported.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation