- JSON representation
- GceClusterConfig
- PrivateIpv6GoogleAccess
- ReservationAffinity
- Type
- NodeGroupAffinity
- ShieldedInstanceConfig
- ConfidentialInstanceConfig
- SoftwareConfig
- Component
- NodeInitializationAction
- EncryptionConfig
- AutoscalingConfig
- SecurityConfig
- KerberosConfig
- IdentityConfig
- LifecycleConfig
- EndpointConfig
- DataprocMetricConfig
- Metric
- MetricSource
- AuxiliaryNodeGroup
The cluster config.
JSON representation |
---|
{ "configBucket": string, "tempBucket": string, "gceClusterConfig": { object ( |
Fields | |
---|---|
config |
Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a |
temp |
Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a |
gce |
Optional. The shared Compute Engine config settings for all instances in a cluster. |
master |
Optional. The Compute Engine config settings for the cluster's master instance. |
worker |
Optional. The Compute Engine config settings for the cluster's worker instances. |
secondary |
Optional. The Compute Engine config settings for a cluster's secondary worker instances |
software |
Optional. The config settings for cluster software. |
initialization |
Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's
|
encryption |
Optional. Encryption settings for the cluster. |
autoscaling |
Optional. Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset. |
security |
Optional. Security settings for the cluster. |
lifecycle |
Optional. Lifecycle setting for the cluster. |
endpoint |
Optional. Port/endpoint configuration for this cluster |
metastore |
Optional. Metastore configuration. |
dataproc |
Optional. The config for Dataproc metrics. |
auxiliary |
Optional. The node group settings. |
GceClusterConfig
Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.
JSON representation |
---|
{ "zoneUri": string, "networkUri": string, "subnetworkUri": string, "privateIpv6GoogleAccess": enum ( |
Fields | |
---|---|
zone |
Optional. The Compute Engine zone where the Dataproc cluster will be located. If omitted, the service will pick a zone in the cluster's Compute Engine region. On a get request, zone will always be present. A full URL, partial URI, or short name are valid. Examples:
|
network |
Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetworkUri. If neither A full URL, partial URI, or short name are valid. Examples:
|
subnetwork |
Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with networkUri. A full URL, partial URI, or short name are valid. Examples:
|
private |
Optional. The type of IPv6 access for a cluster. |
service |
Optional. The Dataproc service account (also see VM Data Plane identity) used by Dataproc cluster VM instances to access Google Cloud Platform services. If not specified, the Compute Engine default service account is used. |
service |
Optional. The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included:
If no scopes are specified, the following defaults are also provided: |
tags[] |
The Compute Engine network tags to add to all instances (see Tagging instances). |
metadata |
Optional. The Compute Engine metadata entries to add to all instances (see Project and instance metadata). An object containing a list of |
reservation |
Optional. Reservation Affinity for consuming Zonal reservation. |
node |
Optional. Node Group Affinity for sole-tenant clusters. |
shielded |
Optional. Shielded Instance Config for clusters using Compute Engine Shielded VMs. |
confidential |
Optional. Confidential Instance Config for clusters using Confidential VMs. |
internal |
Optional. This setting applies to subnetwork-enabled networks. It is set to When set to
When set to
|
PrivateIpv6GoogleAccess
PrivateIpv6GoogleAccess
controls whether and how Dataproc cluster nodes can communicate with Google Services through gRPC over IPv6. These values are directly mapped to corresponding values in the Compute Engine Instance fields.
Enums | |
---|---|
PRIVATE_IPV6_GOOGLE_ACCESS_UNSPECIFIED |
If unspecified, Compute Engine default behavior will apply, which is the same as INHERIT_FROM_SUBNETWORK . |
INHERIT_FROM_SUBNETWORK |
Private access to and from Google Services configuration inherited from the subnetwork configuration. This is the default Compute Engine behavior. |
OUTBOUND |
Enables outbound private IPv6 access to Google Services from the Dataproc cluster. |
BIDIRECTIONAL |
Enables bidirectional private IPv6 access between Google Services and the Dataproc cluster. |
ReservationAffinity
Reservation Affinity for consuming Zonal reservation.
JSON representation |
---|
{
"consumeReservationType": enum ( |
Fields | |
---|---|
consume |
Optional. Type of reservation to consume |
key |
Optional. Corresponds to the label key of reservation resource. |
values[] |
Optional. Corresponds to the label values of reservation resource. |
Type
Indicates whether to consume capacity from an reservation or not.
Enums | |
---|---|
TYPE_UNSPECIFIED |
|
NO_RESERVATION |
Do not consume from any allocated capacity. |
ANY_RESERVATION |
Consume any reservation available. |
SPECIFIC_RESERVATION |
Must consume from a specific reservation. Must specify key value fields for specifying the reservations. |
NodeGroupAffinity
Node Group Affinity for clusters using sole-tenant node groups. The Dataproc NodeGroupAffinity
resource is not related to the Dataproc NodeGroup
resource.
JSON representation |
---|
{ "nodeGroupUri": string } |
Fields | |
---|---|
node |
Required. The URI of a sole-tenant node group resource that the cluster will be created on. A full URL, partial URI, or node group name are valid. Examples:
|
ShieldedInstanceConfig
Shielded Instance Config for clusters using Compute Engine Shielded VMs.
JSON representation |
---|
{ "enableSecureBoot": boolean, "enableVtpm": boolean, "enableIntegrityMonitoring": boolean } |
Fields | |
---|---|
enable |
Optional. Defines whether instances have Secure Boot enabled. |
enable |
Optional. Defines whether instances have the vTPM enabled. |
enable |
Optional. Defines whether instances have integrity monitoring enabled. |
ConfidentialInstanceConfig
Confidential Instance Config for clusters using Confidential VMs
JSON representation |
---|
{ "enableConfidentialCompute": boolean } |
Fields | |
---|---|
enable |
Optional. Defines whether the instance should have confidential compute enabled. |
SoftwareConfig
Specifies the selection and config of software inside the cluster.
JSON representation |
---|
{
"imageVersion": string,
"properties": {
string: string,
...
},
"optionalComponents": [
enum ( |
Fields | |
---|---|
image |
Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions, such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version. If unspecified, it defaults to the latest Debian version. |
properties |
Optional. The properties to set on daemon config files. Property keys are specified in
For more information, see Cluster properties. An object containing a list of |
optional |
Optional. The set of components to activate on the cluster. |
Component
Cluster components that can be activated.
Enums | |
---|---|
COMPONENT_UNSPECIFIED |
Unspecified component. Specifying this will cause Cluster creation to fail. |
ANACONDA |
The Anaconda component is no longer supported or applicable to supported Dataproc on Compute Engine image versions. It cannot be activated on clusters created with supported Dataproc on Compute Engine image versions. |
DOCKER |
Docker |
DRUID |
The Druid query engine. (alpha) |
FLINK |
Flink |
HBASE |
HBase. (beta) |
HIVE_WEBHCAT |
The Hive Web HCatalog (the REST service for accessing HCatalog). |
HUDI |
Hudi. |
JUPYTER |
The Jupyter Notebook. |
PRESTO |
The Presto query engine. |
RANGER |
The Ranger service. |
SOLR |
The Solr service. |
ZEPPELIN |
The Zeppelin notebook. |
ZOOKEEPER |
The Zookeeper service. |
NodeInitializationAction
Specifies an executable to run on a fully configured node and a timeout period for executable completion.
JSON representation |
---|
{ "executableFile": string, "executionTimeout": string } |
Fields | |
---|---|
executable |
Required. Cloud Storage URI of executable file. |
execution |
Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period. |
EncryptionConfig
Encryption settings for the cluster.
JSON representation |
---|
{ "gcePdKmsKeyName": string, "kmsKey": string } |
Fields | |
---|---|
gce |
Optional. The Cloud KMS key resource name to use for persistent disk encryption for all instances in the cluster. See Use CMEK with cluster data for more information. |
kms |
Optional. The Cloud KMS key resource name to use for cluster persistent disk and job argument encryption. See Use CMEK with cluster data for more information. When this key resource name is provided, the following job arguments of the following job types submitted to the cluster are encrypted using CMEK:
|
AutoscalingConfig
Autoscaling Policy config associated with the cluster.
JSON representation |
---|
{ "policyUri": string } |
Fields | |
---|---|
policy |
Optional. The autoscaling policy used by the cluster. Only resource names including projectid and location (region) are valid. Examples:
Note that the policy must be in the same project and Dataproc region. |
SecurityConfig
Security related configuration, including encryption, Kerberos, etc.
JSON representation |
---|
{ "kerberosConfig": { object ( |
Fields | |
---|---|
kerberos |
Optional. Kerberos related configuration. |
identity |
Optional. Identity related configuration, including service account based secure multi-tenancy user mappings. |
KerberosConfig
Specifies Kerberos related configuration.
JSON representation |
---|
{ "enableKerberos": boolean, "rootPrincipalPasswordUri": string, "kmsKeyUri": string, "keystoreUri": string, "truststoreUri": string, "keystorePasswordUri": string, "keyPasswordUri": string, "truststorePasswordUri": string, "crossRealmTrustRealm": string, "crossRealmTrustKdc": string, "crossRealmTrustAdminServer": string, "crossRealmTrustSharedPasswordUri": string, "kdcDbKeyUri": string, "tgtLifetimeHours": integer, "realm": string } |
Fields | |
---|---|
enable |
Optional. Flag to indicate whether to Kerberize the cluster (default: false). Set this field to true to enable Kerberos on a cluster. |
root |
Optional. The Cloud Storage URI of a KMS encrypted file containing the root principal password. |
kms |
Optional. The URI of the KMS key used to encrypt sensitive files. |
keystore |
Optional. The Cloud Storage URI of the keystore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate. |
truststore |
Optional. The Cloud Storage URI of the truststore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate. |
keystore |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore. For the self-signed certificate, this password is generated by Dataproc. |
key |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key. For the self-signed certificate, this password is generated by Dataproc. |
truststore |
Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore. For the self-signed certificate, this password is generated by Dataproc. |
cross |
Optional. The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust. |
cross |
Optional. The KDC (IP or hostname) for the remote trusted realm in a cross realm trust relationship. |
cross |
Optional. The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship. |
cross |
Optional. The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship. |
kdc |
Optional. The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database. |
tgt |
Optional. The lifetime of the ticket granting ticket, in hours. If not specified, or user specifies 0, then default value 10 will be used. |
realm |
Optional. The name of the on-cluster Kerberos realm. If not specified, the uppercased domain of hostnames will be the realm. |
IdentityConfig
Identity related configuration, including service account based secure multi-tenancy user mappings.
JSON representation |
---|
{ "userServiceAccountMapping": { string: string, ... } } |
Fields | |
---|---|
user |
Required. Map of user to service account. An object containing a list of |
LifecycleConfig
Specifies the cluster auto-delete schedule configuration.
JSON representation |
---|
{ "idleDeleteTtl": string, "idleStartTime": string, // Union field |
Fields | |
---|---|
idle |
Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days (see JSON representation of Duration). |
idle |
Output only. The time when cluster became idle (most recent job finished) and became eligible for deletion due to idleness (see JSON representation of Timestamp). A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
Union field ttl . Either the exact time the cluster should be deleted at or the cluster maximum age. ttl can be only one of the following: |
|
auto |
Optional. The time when cluster will be auto-deleted (see JSON representation of Timestamp). A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
auto |
Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration). |
EndpointConfig
Endpoint config for this cluster
JSON representation |
---|
{ "httpPorts": { string: string, ... }, "enableHttpPortAccess": boolean } |
Fields | |
---|---|
http |
Output only. The map of port descriptions to URLs. Will only be populated if enableHttpPortAccess is true. An object containing a list of |
enable |
Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false. |
DataprocMetricConfig
Dataproc metric config.
JSON representation |
---|
{
"metrics": [
{
object ( |
Fields | |
---|---|
metrics[] |
Required. Metrics sources to enable. |
Metric
A Dataproc custom metric.
JSON representation |
---|
{
"metricSource": enum ( |
Fields | |
---|---|
metric |
Required. A standard set of metrics is collected unless |
metric |
Optional. Specify one or more Custom metrics to collect for the metric course (for the Provide metrics in the following format:
Use camelcase as appropriate. Examples:
Notes:
|
MetricSource
A source for the collection of Dataproc custom metrics (see Custom metrics).
Enums | |
---|---|
METRIC_SOURCE_UNSPECIFIED |
Required unspecified metric source. |
MONITORING_AGENT_DEFAULTS |
Monitoring agent metrics. If this source is enabled, Dataproc enables the monitoring agent in Compute Engine, and collects monitoring agent metrics, which are published with an agent.googleapis.com prefix. |
HDFS |
HDFS metric source. |
SPARK |
Spark metric source. |
YARN |
YARN metric source. |
SPARK_HISTORY_SERVER |
Spark History Server metric source. |
HIVESERVER2 |
Hiveserver2 metric source. |
HIVEMETASTORE |
hivemetastore metric source |
FLINK |
flink metric source |
AuxiliaryNodeGroup
Node group identification and configuration information.
JSON representation |
---|
{
"nodeGroup": {
object ( |
Fields | |
---|---|
node |
Required. Node group configuration. |
node |
Optional. A node group ID. Generated if not specified. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of from 3 to 33 characters. |