Cluster properties

The open source components installed on Google Cloud Dataproc clusters contain many configuration files. For example, Spark and Hadoop have several XML and plain text configuration files. From time to time, you may need to update or add to these configuration files. You can easily use the --properties option of the dataproc command in the Google Cloud SDK to modify many common configuration files when creating a cluster.

How the properties option works

To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated.

Formatting

The --properties command requires a string of text that uses the following format:

file_prefix:property=value

The --properties command can only modify a specific set of commonly used configuration files. The file_prefix maps to a predefined set of configuration files.

file_prefix File Purpose of file
core core-site.xml Hadoop general configuration
hdfs hdfs-site.xml Hadoop HDFS configuration
mapred mapred-site.xml Hadoop MapReduce configuration
distcp distcp-default.xml Hadoop Distributed Copy configuration
yarn yarn-site.xml Hadoop YARN configuration
capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration
hive hive-site.xml Hive configuration
pig pig.properties Pig configuration
spark spark-defaults.conf Spark configuration

Important notes

  • Some properties are reserved and cannot be overridden because they would impact the functionality of the Cloud Dataproc cluster. If you try changing a reserved property, you can expect an error message when creating your cluster.
  • You can specify multiple changes by separating each with a comma.
  • The --properties command cannot modify configuration files not shown above.
  • Changing properties when creating clusters in the Google Cloud Platform Console is currently not supported.
  • The changes in properties will be applied before the daemons on your cluster start.
  • If the specified property already exists, it will be updated. If the specified property does not exist, it will be added to the configuration file.

Cloud Dataproc service properties

These are additional properties specific to Cloud Dataproc, not included in the files listed above. These properties can be used to further configure the functionality of your Cloud Dataproc cluster.

Property Values Function
dataproc:dataproc.logging.stackdriver.enable true or false Enables (true) or disables (false) logging to Google Stackdriver.
dataproc:dataproc.monitoring.stackdriver.enable true or false Enables (true) or disables (false) the Google Stackdriver Monitoring Agent.

Examples

gcloud command

To change the spark.mastersetting in the spark-defaults.conf file, you can do so by adding the following --properties flag when creating a new cluster on the command line:
--properties 'spark:spark.master=spark://example.com'
You can change several properties at once, in one or more configuration files, by using a comma separator. Each property must be specified in the full file_prefix:property=value format. For example, to change the spark.master setting in the spark-defaults.conf file and the dfs.hosts setting in the hdfs-site.xml file, you can use the following flag when creating a cluster:
--properties 'spark:spark.master=spark://example.com,hdfs:dfs.hosts=/foo/bar/baz'

REST API

To set spark.executor.memory to 10gb, insert the following in the body of your cluster create JSON request:
'properties': {
  'spark:spark.executor.memory': '10gb'
}

Console

Currently, adding cluster properties from the Cloud Dataproc Create a cluster Cloud Platform Console page is not supported.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Dataproc Documentation