Apache Hadoop YARN, HDFS, Spark, and related properties
The open source components installed on Dataproc clusters contain many
configuration files. For example, Apache Spark and Apache Hadoop have several XML
and plain text configuration files. You can use the
‑‑properties flag of the
gcloud dataproc clusters create
command to modify many common configuration files when creating a cluster.
Formatting
The gcloud dataproc clusters create --properties flag accepts the following
string format:
file_prefix1:property1=value1,file_prefix2:property2=value2,...
- The file_prefix maps to a predefined configuration file as shown in the table below, and the property maps to a property within the file. 
- The default delimiter used to separate multiple cluster properties is the comma (,). However, if a comma is included in a property value, you must change the delimiter by specifying a "^delimiter^" at the beginning of the property list (see gcloud topic escaping for more information). - Example using a "#" delimiter:
--properties ^#^file_prefix1:property1=part1,part2#file_prefix2:property2=value2 
 
- Example using a "#" delimiter:
Examples
gcloud command
To change the spark.mastersetting in the
spark-defaults.conf file, add the following
gcloud dataproc clusters create --properties flag:
--properties 'spark:spark.master=spark://example.com'
You can change several properties at once, in one or more configuration files,
by using a comma separator. Each property must be specified in the full
file_prefix:property=value format. For example, to change the
spark.master setting in the spark-defaults.conf file
and the dfs.hosts setting in the hdfs-site.xml file,
use the following --propertiesflag when creating a cluster:
--properties 'spark:spark.master=spark://example.com,hdfs:dfs.hosts=/foo/bar/baz'
REST API
To set spark.executor.memory to 10g, insert the
    following properties setting in the
    SoftwareConfig
    section of your
      clusters.create request:
"properties": {
  "spark:spark.executor.memory": "10g"
}
An easy way to see how to construct the JSON body of a
Dataproc API clusters REST request is to initiate the
equivalent gcloud command using the --log-http flag.
Here is a sample gcloud dataproc clusters create command, which sets cluster
properties with the --properties spark:spark.executor.memory=10g flag.
The stdout log shows the resulting REST request body (the properties
  snippet is shown below):
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --properties=spark:spark.executor.memory=10g \ --log-http \ other args ...
Output:
...
== body start ==
{"clusterName": "my-cluster", "config": {"gceClusterConfig": ...
"masterConfig": {... "softwareConfig": {"properties": {"spark:spark.executor.memory": "10g"}},
...
== body end ==
...
Make sure to cancel the command after the JSON body appears in the output if you do not want the command to take effect.
Console
To change the spark.master setting in the
  spark-defaults.conf file:
- In the Google Cloud console, open the Dataproc Create a cluster page. Click the Customize cluster panel, then scroll to the Cluster properties section.
- Click + ADD PROPERTIES. Select spark in the Prefix list, then add "spark.master" in the Key field and the setting in the Value field.
Cluster vs. job properties
The Apache Hadoop YARN, HDFS, Spark, and other file-prefixed properties are applied at the cluster level when you create a cluster. These properties cannot be applied to a cluster after cluster creation. However, many of these properties can also be applied to specific jobs. When applying a property to a job, the file prefix is not used.
The following example sets Spark executor memory to 4g for a Spark job
(spark: prefix omitted).
gcloud dataproc jobs submit spark \ --region=REGION \ --properties=spark.executor.memory=4g \ other args ...
Job properties can be submitted in a file using the
gcloud dataproc jobs submit job-type --properties-file
flag (see, for example, the
--properties-file
description for an Hadoop job submission).
gcloud dataproc jobs submit JOB_TYPE \ --region=REGION \ --properties-file=PROPERTIES_FILE \ other args ...
The PROPERTIES_FILE is a set of
line-delimited key=value pairs. The property to be set is the key,
and the value to set the property to is the value. See the
java.util.Properties
class for a detailed description of the properties file format.
The following is an example of a properties file that can be
passed to the --properties-file flag when submitting a Dataproc job.
dataproc:conda.env.config.uri=gs://some-bucket/environment.yaml spark:spark.history.fs.logDirectory=gs://some-bucket spark:spark.eventLog.dir=gs://some-bucket capacity-scheduler:yarn.scheduler.capacity.root.adhoc.capacity=5
File-prefixed properties table
| File prefix | File | File purpose | 
|---|---|---|
| capacity-scheduler | capacity-scheduler.xml | Hadoop YARN Capacity Scheduler configuration | 
| core | core-site.xml | Hadoop general configuration | 
| distcp | distcp-default.xml | Hadoop Distributed Copy configuration | 
| flink | flink-conf.yaml | Flink configuration | 
| flink-log4j | log4j.properties | Log4j settings file | 
| hadoop-env | hadoop-env.sh | Hadoop specific environment variables | 
| hadoop-log4j | log4j.properties | Log4j settings file | 
| hbase | hbase-site.xml | HBase configuration | 
| hbase-log4j | log4j.properties | Log4j settings file | 
| hdfs | hdfs-site.xml | Hadoop HDFS configuration | 
| hive | hive-site.xml | Hive configuration | 
| hive-log4j2 | hive-log4j2.properties | Log4j settings file | 
| hudi | hudi-default.conf | Hudi configuration | 
| mapred | mapred-site.xml | Hadoop MapReduce configuration | 
| mapred-env | mapred-env.sh | Hadoop MapReduce specific environment variables | 
| pig | pig.properties | Pig configuration | 
| pig-log4j | log4j.properties | Log4j settings file | 
| presto | config.properties | Presto configuration | 
| presto-jvm | jvm.config | Presto specific JVM configuration | 
| spark | spark-defaults.conf | Spark configuration | 
| spark-env | spark-env.sh | Spark specific environment variables | 
| spark-log4j | log4j.properties | Log4j settings file | 
| tez | tez-site.xml | Tez configuration | 
| webcat-log4j | webhcat-log4j2.properties | Log4j settings file | 
| yarn | yarn-site.xml | Hadoop YARN configuration | 
| yarn-env | yarn-env.sh | Hadoop YARN specific environment variables | 
| zeppelin | zeppelin-site.xml | Zeppelin configuration | 
| zeppelin-env | zeppelin-env.sh | Zeppelin specific environment variables (Optional Component only) | 
| zeppelin-log4j | log4j.properties | Log4j settings file | 
| zookeeper | zoo.cfg | Zookeeper configuration | 
| zookeeper-log4j | log4j.properties | Log4j settings file | 
Notes
- Some properties are reserved and cannot be overridden because they impact the functionality of the Dataproc cluster. If you try to change a reserved property, you will receive an error message when creating your cluster.
- You can specify multiple changes by separating each with a comma.
- The --propertiesflag cannot modify configuration files not shown above.
- Changes to properties will be applied before the daemons on your cluster start.
- If the specified property exists, it will be updated. If the specified property does not exist, it will be added to the configuration file.
Dataproc service properties
The properties listed in this section are specific to Dataproc. These properties can be used to further configure the functionality of your Dataproc cluster.
Formatting
The gcloud dataproc clusters create --properties flag accepts the following
string format:
property_prefix1:property1=value1,property_prefix2:property2=value2,...
- The default delimiter used to separate multiple cluster properties is the comma (,). However, if a comma is included in a property value, you must change the delimiter by specifying "^delimiter^" at the beginning of the property list (see gcloud topic escaping for more information). - Example using a "#" delimiter:
--properties ^#^property_prefix1:property1=part1,part2#property_prefix2:property2=value2 
 
- Example using a "#" delimiter:
Example:
Create a cluster and set Enhanced Flexibility Mode to Spark primary worker shuffle.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --properties=dataproc:efm.spark.shuffle=primary-worker \ other args ...
Dataproc service properties table
| Property prefix | Property | Values | Description | 
|---|---|---|---|
| dataproc | agent.process.threads.job.min | number | Dataproc runs user job drivers concurrently in a thread pool. This property controls the minimum number of threads in the thread pool for fast startup even when no jobs are running (default: 10). | 
| dataproc | agent.process.threads.job.max | number | Dataproc runs user job drivers concurrently in a thread pool. This property controls the maximum number of threads in the thread pool, therefore limiting the maximum concurrency of user jobs. Increase this value for higher concurrency (default: 100). | 
| dataproc | am.primary_only | trueorfalse | Set this property to trueto prevent application master from running on Dataproc cluster preemptible workers. Note: This feature is only available with Dataproc 1.2 and higher. The default value isfalse. | 
| dataproc | conda.env.config.uri | gs://<path> | Location in Cloud Storage of the Conda environment config file. A new Conda environment will be created and activated based on this file. For more information, see Using Conda related Cluster properties. (default: empty). | 
| dataproc | conda.packages | Conda packages | This property takes a list of comma-separated Conda packages with specific versions to be installed in the baseConda environment. For more information, see Using Conda related Cluster properties. (default:empty). | 
| dataproc | dataproc.allow.zero.workers | trueorfalse | Set this SoftwareConfig property to truein a Dataprocclusters.createAPI request to create a Single node cluster, which changes default number of workers from 2 to 0, and places worker components on the master host. A Single node cluster can also be created from the Google Cloud console or with the Google Cloud CLI by setting the number of workers to0. | 
| dataproc | dataproc.alpha.master.nvdimm.size.gb | 1500-6500 | Setting a value creates a Dataproc master with Intel Optane DC Persistent memory. Note: Optane VMs can only be created in us-central1-fzones, only withn1-highmem-96-aepmachine type and only under whitelisted projects. | 
| dataproc: | dataproc.alpha.worker.nvdimm.size.gb | 1500-6500 | Setting a value creates a Dataproc worker with Intel Optane DC Persistent memory. Note: Optane VMs can only be created in us-central1-fzones, only withn1-highmem-96-aepmachine type and only under whitelisted projects. | 
| dataproc: | dataproc.await-new-workers-service-registration | trueorfalse | This property is available in images 2.0.49+. Default value is false. Set this property totrueto wait for new primary workers to register service leaders, such as HDFS NameNode and YARN ResourceManager, during cluster creation or cluster scale-up (only HDFS and YARN services are monitored). When set totrue, if a new worker fails to register to a service, the worker is assigned aFAILEDstatus. A failed worker is removed if the cluster is scaling up. If the cluster is being created, a failed worker is removed if thegcloud dataproc clusters create --action-on-failed-primary-workers=DELETEflag or the APIactionOnFailedPrimaryWorkers=DELETEfield was specified as part of thegcloudcommand or API cluster create request. | 
| dataproc: | dataproc.beta.secure.multi-tenancy.user.mapping | user-to-service account mappings | This property takes a list of user-to-service account mappings. Mapped users can submit interactive workloads to the cluster with isolated user identities (see Dataproc Service Account Based Secure Multi-tenancy). | 
| dataproc: | dataproc.cluster.caching.enabled | trueorfalse | When the cluster caching is enabled, the cluster caches Cloud Storage data accessed by Spark jobs, which improves job performance without compromising consistency. (default: false). | 
| dataproc | dataproc.cluster-ttl.consider-yarn-activity | trueorfalse | When set to true, Cluster Scheduled Deletion considers both YARN and Dataproc Jobs API activity when calculating cluster idle time. When set tofalse, Dataproc Jobs API activity only is considered. (default:true). For more information, see Cluster idle time calculation. | 
| dataproc | dataproc.conscrypt.provider.enable | trueorfalse | Enables ( true) or disables (false) Conscrypt as the primary Java security provider. Note: Conscrypt is enabled by default in Dataproc 1.2 and higher, but disabled in 1.0/1.1. | 
| dataproc | dataproc.cooperative.multi-tenancy.user.mapping | user-to-service account mappings | This property takes a list of comma-separated user-to-service account mappings. If a cluster is created with this property set, when a user submits a job, the cluster will attempt to impersonate the corresponding service account when accessing Cloud Storage through the Cloud Storage connector. This feature requires Cloud Storage connector version 2.1.4or higher. For more information, see Dataproc cooperative multi-tenancy. (default:empty). | 
| dataproc | dataproc.control.max.assigned.job.tasks | 100 | This property limits how many tasks can concurrently run on a cluster master node. If the active task count exceeds the task limit, new jobs queue until running jobs complete and resources are freed up to allow new tasks to be scheduled. Note: Setting a default task limit in excess of 100(the default) is not recommended since this can cause an out-of-memory condition in the master node. | 
| dataproc | dataproc:hudi.version | Hudi version | Sets the Hudi version used with the optional Dataproc Hudi component. Note: This version is set by Dataproc to be compatible with the cluster image version. If it is set by the user, cluster creation can fail if the specified version is not compatible with the cluster image. | 
| dataproc | dataproc.lineage.enabled | true | Enables data lineage in a Dataproc cluster for Spark jobs. | 
| dataproc | dataproc.localssd.mount.enable | trueorfalse | Whether to mount local SSDs as Hadoop/Spark temp directories and HDFS data directories (default: true). | 
| dataproc | dataproc.logging.extended.enabled | trueorfalse | Enables ( true) or disables (false) logs in Cloud Logging for the following:knox,zeppelin,ranger-usersync,jupyter_notebook,jupyter_kernel_gateway, andspark-history-server(default:true). For more information, see Dataproc cluster logs in Logging. | 
| dataproc | dataproc.logging.stackdriver.enable | trueorfalse | Enables ( true) or disables (false) Cloud Logging (default:true). See Cloud Logging Pricing for associated charges. | 
| dataproc | dataproc.logging.stackdriver.job.driver.enable | trueorfalse | Enables ( true) or disables (false) Dataproc job driver logs in Cloud Logging. See Dataproc job output and logs (default:false). | 
| dataproc | dataproc.logging.stackdriver.job.yarn.container.enable | trueorfalse | Enables ( true) or disables (false) YARN container logs in Cloud Logging. See Spark job output options. (default:false). | 
| dataproc | dataproc.logging.syslog.enabled | trueorfalse | Enables ( true) or disables (false) VM syslogs in Cloud Logging (default:true). | 
| dataproc | dataproc.master.custom.init.actions.mode | RUN_BEFORE_SERVICESorRUN_AFTER_SERVICES | For 2.0+ image clusters, when set to RUN_AFTER_SERVICES, initialization actions on the master will run after HDFS and any services that depend on HDFS are initialized. Examples of HDFS-dependent services include: HBase, Hive Server2, Ranger, Solr, and the Spark and MapReduce history servers.  (default:RUN_BEFORE_SERVICES). | 
| dataproc | dataproc.monitoring.stackdriver.enable | trueorfalse | Enables ( true) or disables (false) the Monitoring agent (default:false). This property is deprecated. See Enable custom metric collection to enable the collection of Dataproc OSS metric collection in Monitoring. | 
| dataproc | dataproc.scheduler.driver-size-mb | number | The average driver memory footprint, which determines the maximum number of concurrent jobs a cluster will run. The default value is 1GB. A smaller value, such as256, may be appropriate for Spark jobs. | 
| dataproc | dataproc.scheduler.job-submission-rate | number | Jobs are throttled if this rate is exceeded. The default rate is 1.0QPS. | 
| dataproc | dataproc.scheduler.max-concurrent-jobs | number | The maximum number of concurrent jobs. If this value is not set when the cluster is created, the upper limit on concurrent jobs is calculated as max((masterMemoryMb - 3584) / masterMemoryMbPerJob, 5).masterMemoryMbis determined by the master VM's machine type.masterMemoryMbPerJobis1024by default, but is configurable at cluster creation with thedataproc:dataproc.scheduler.driver-size-mbcluster property. | 
| dataproc | dataproc.scheduler.max-memory-used | number | The maximum amount of the RAM that can be used. If current usage is above this threshold, new jobs cannot be scheduled. The default is 0.9(90%). If set to1.0, master-memory-utilization job throttling is disabled. | 
| dataproc | dataproc.scheduler.min-free-memory.mb | number | The minimum amount of free memory in megabytes needed by the Dataproc job driver to schedule another job on the cluster. The default is 256MB. | 
| dataproc | dataproc.snap.enabled | trueorfalse | Enables or disables Ubuntu Snap daemon. The default value is true. If set tofalse, pre-installed Snap packages in the image are not affected, but auto refresh is disabled. Applies to 1.4.71, 1.5.46, 2.0.20 and newer Ubuntu images. | 
| dataproc | dataproc.worker.custom.init.actions.mode | RUN_BEFORE_SERVICES | For pre-2.0 image clusters, RUN_BEFORE_SERVICES is not set, but can be set by the user when the cluster is created. For 2.0+ image clusters, RUN_BEFORE_SERVICES is set, and the property cannot be passed to the cluster (it cannot be changed by the user). For information on the effect of this setting, see Important considerations and guidelines—Initialization processing. | 
| dataproc | dataproc.yarn.orphaned-app-termination.enable | trueorfalse | Default value is true. Set tofalseto prevent Dataproc from terminating "orphaned" YARN apps. Dataproc considers a YARN app to be orphaned if the job driver that submitted the YARN app has exited. Warning: If you use Spark cluster mode (spark.submit.deployMode=cluster) and you setspark.yarn.submit.waitAppCompletion=false, the Spark driver exits without waiting for YARN apps to complete; in this case, setdataproc:dataproc.yarn.orphaned-app-termination.enable=false. Also set this property tofalseif you submit Hive jobs. | 
| dataproc | diagnostic.capture.enabled | trueorfalse | Enables the collection of cluster checkpoint diagnostic data. (default: true). | 
| dataproc | diagnostic.capture.access | GOOGLE_DATAPROC_DIAGNOSE | If set to GOOGLE_DATAPROC_DIAGNOSE, cluster checkpoint diagnostic data, which is saved in Cloud Storage, is shared with Dataproc support. (default: not set). | 
| dataproc | efm.spark.shuffle | primary-worker | If set to primary-worker, Spark shuffle data is written to primary workers". See Dataproc Enhanced Flexibility Mode for more information. | 
| dataproc | job.history.to-gcs.enabled | trueorfalse | Allows persisting MapReduce and Spark history files to the Dataproc temp bucket (default: truefor image versions 1.5+). Users can overwrite the locations of job history file persistence through the following properties:mapreduce.jobhistory.done-dir,mapreduce.jobhistory.intermediate-done-dir,spark.eventLog.dir, andspark.history.fs.logDirectory. See Dataproc Persistent History Server  for information on these and other cluster properties associated with the Dataproc job history and event files. | 
| dataproc | jobs.file-backed-output.enable | trueorfalse | Configures Dataproc jobs to pipe their output to temporary files in the /var/log/google-dataproc-jobdirectory. Must be set totrueto enable job driver logging in Cloud Logging (default:true). | 
| dataproc | jupyter.listen.all.interfaces | trueorfalse | To reduce the risk of remote code execution over unsecured notebook server APIs, the default setting for image versions 1.3+ is false, which restricts connections tolocalhost(127.0.0.1) when Component Gateway is enabled (Component Gateway activation is not required for 2.0+ images). This default setting can be overridden by setting this property totrueto allow all connections. | 
| dataproc | jupyter.notebook.gcs.dir | gs://<dir-path> | Location in Cloud Storage to save Jupyter notebooks. | 
| dataproc | kerberos.beta.automatic-config.enable | trueorfalse | When set to true, users do not need to specify the Kerberos root principal password with the--kerberos-root-principal-passwordand--kerberos-kms-key-uriflags (default:false). See Enabling Hadoop Secure Mode via Kerberos for more information. | 
| dataproc | kerberos.cross-realm-trust.admin-server | hostname/address | hostname/address of remote admin server (often the same as the KDC server). | 
| dataproc | kerberos.cross-realm-trust.kdc | hostname/address | hostname/address of remote KDC. | 
| dataproc | kerberos.cross-realm-trust.realm | realm name | Realm names can consist of any UPPERCASE ASCII string. Usually, the realm name is the same as your DNS domain name (in UPPERCASE). Example: If machines are named "machine-id.example.west-coast.mycompany.com", the associated realm may be designated as "EXAMPLE.WEST-COAST.MYCOMPANY.COM". | 
| dataproc | kerberos.cross-realm-trust.shared-password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted shared password. | 
| dataproc | kerberos.kdc.db.key.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted file containing the KDC database master key. | 
| dataproc | kerberos.key.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted file that contains the password of the key in the keystore file. | 
| dataproc | kerberos.keystore.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted file containing the keystore password. | 
| dataproc | kerberos.keystore.uri1 | gs://<dir-path> | Location in Cloud Storage of the keystore file containing the wildcard certificate and the private key used by cluster nodes. | 
| dataproc | kerberos.kms.key.uri | KMS key URI | The URI of the KMS key used to decrypt root password, for example projects/project-id/locations/region/keyRings/key-ring/cryptoKeys/key(see Key resource ID). | 
| dataproc | kerberos.root.principal.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted password for Kerberos root principal. | 
| dataproc | kerberos.tgt.lifetime.hours | hours | Max life time of the ticket granting ticket. | 
| dataproc | kerberos.truststore.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted file that contains the password to the truststore file. | 
| dataproc | kerberos.truststore.uri2 | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted trust store file containing trusted certificates. | 
| dataproc | pip.packages | Pip packages | This property takes a list of comma-separated Pip packages with specific versions, to be installed in the baseConda environment. For more information, see Conda related Cluster properties. (default:empty). | 
| dataproc | ranger.kms.key.uri | KMS key URI | The URI of the KMS key used to decrypt Ranger admin user password, for example projects/project-id/locations/region/keyRings/key-ring/cryptoKeys/key(see Key resource ID). | 
| dataproc | ranger.admin.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted password for Ranger admin user. | 
| dataproc | ranger.db.admin.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted password for Ranger database admin user. | 
| dataproc | ranger.cloud-sql.instance.connection.name | cloud sql instance connection name | The connection name of the Cloud SQL instance, for example project-id:region:name. | 
| dataproc | ranger.cloud-sql.root.password.uri | gs://<dir-path> | Location in Cloud Storage of the KMS-encrypted password for the root user of the Cloud SQL instance. | 
| dataproc | ranger.cloud-sql.use-private-ip | trueorfalse | Whether the communication between cluster instances and the Cloud SQL instance should be over private IP (default value is false). | 
| dataproc | solr.gcs.path | gs://<dir-path> | Cloud Storage path to act as Solr home directory. | 
| dataproc | startup.component.service-binding-timeout.hadoop-hdfs-namenode | seconds | The amount of time the Dataproc startup script will wait for the hadoop-hdfs-namenode to bind to ports before deciding that its startup has succeeded. The maximum recognized value is 1800 seconds (30 minutes). | 
| dataproc | startup.component.service-binding-timeout.hive-metastore | seconds | The amount of time the Dataproc startup script will wait for the hive-metastore service to bind to ports before deciding that its startup has succeeded. The maximum recognized value is 1800 seconds (30 minutes). | 
| dataproc | startup.component.service-binding-timeout.hive-server2 | seconds | The amount of time the Dataproc startup script will wait for the hive-server2 to bind to ports before deciding that its startup has succeeded. The maximum recognized value is 1800 seconds (30 minutes). | 
| dataproc | user-attribution.enabled | trueorfalse | Set this property to trueto attribute a Dataproc job to the identity of the user who submitted it (default value isfalse). | 
| dataproc | yarn.docker.enable | trueorfalse | Set to trueto enable the Dataproc Docker on YARN feature (default value isfalse). | 
| dataproc | yarn.docker.image | docker image | When enabling the Dataproc Docker on YARN feature ( dataproc:yarn.docker.enable=true), you can use this optional property to specify your docker image (for example,dataproc:yarn.docker.image=gcr.io/project-id/image:1.0.1). If specified, the image is download and cached in all nodes of the cluster during cluster creation. | 
| dataproc | yarn.log-aggregation.enabled | trueorfalse | Allows ( true) turning on YARN log aggregation to the cluster'stemp bucket. The bucket name is of the following form:dataproc-temp-<REGION>-<PROJECT_NUMBER>-<RANDOM_STRING>. (default:truefor image versions 1.5+). Note: The cluster's temp bucket is not deleted when the cluster is deleted. Users can also set the location of aggregated YARN logs by overwriting theyarn.nodemanager.remote-app-log-dirYARN property. | 
| knox | gateway.host | ip address | To reduce the risk of remote code execution over unsecured notebook server APIs, the default setting for image versions 1.3+ is 127.0.0.1, which restricts connections tolocalhostwhen Component Gateway is enabled. The default setting can be overridden, for example by setting this property to0.0.0.0to allow all connections. | 
| zeppelin | zeppelin.notebook.gcs.dir | gs://<dir-path> | Location in Cloud Storage to save Zeppelin notebooks. | 
| zeppelin | zeppelin.server.addr | ip address | To reduce the risk of remote code execution over unsecured notebook server APIs, the default setting for image versions 1.3+ is 127.0.0.1, which restricts connections tolocalhostwhen Component Gateway is enabled. This default setting can be overridden, for example by setting this property to0.0.0.0to allow all connections. | 
1Keystore file: The keystore file contains the SSL certificate. It should be
in Java KeyStore (JKS) format. When copied to VMs, it is renamed to keystore.jks.
The SSL certificate should be a wildcard certificate that applies to each node
in the cluster.
2Truststore file: The truststore file should
be in Java KeyStore (JKS) format. When copied to VMs, it is renamed to
truststore.jks.