REST Resource: projects.regions.clusters

Resource: Cluster
- JSON representation
VirtualClusterConfig
- JSON representation
KubernetesClusterConfig
- JSON representation
GkeClusterConfig
- JSON representation
NamespacedGkeDeploymentTarget
- JSON representation
GkeNodePoolTarget
- JSON representation
Role
GkeNodePoolConfig
- JSON representation
GkeNodeConfig
- JSON representation
GkeNodePoolAcceleratorConfig
- JSON representation
GkeNodePoolAutoscalingConfig
- JSON representation
KubernetesSoftwareConfig
- JSON representation
AuxiliaryServicesConfig
- JSON representation
SparkHistoryServerConfig
- JSON representation
ClusterStatus
- JSON representation
State
Substate
ClusterMetrics
- JSON representation
Methods

Resource: Cluster

Describes the identifying information, config, and status of a Dataproc cluster

JSON representation

JSON representation
{ "projectId": string, "clusterName": string, "config": { object (`ClusterConfig`) }, "virtualClusterConfig": { object (`VirtualClusterConfig`) }, "labels": { string: string, ... }, "status": { object (`ClusterStatus`) }, "statusHistory": [ { object (`ClusterStatus`) } ], "clusterUuid": string, "metrics": { object (`ClusterMetrics`) } }

{
  "projectId": string,
  "clusterName": string,
  "config": {
    object (ClusterConfig)
  },
  "virtualClusterConfig": {
    object (VirtualClusterConfig)
  },
  "labels": {
    string: string,
    ...
  },
  "status": {
    object (ClusterStatus)
  },
  "statusHistory": [
    {
      object (ClusterStatus)
    }
  ],
  "clusterUuid": string,
  "metrics": {
    object (ClusterMetrics)
  }
}

Fields
`projectId`	`string` Required. The Google Cloud Platform project ID that the cluster belongs to.
`clusterName`	`string` Required. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused.
`config`	`object (ClusterConfig)` Optional. The cluster config for a cluster of Compute Engine Instances. Note that Dataproc may set default values, and values may change when clusters are updated. Exactly one of ClusterConfig or VirtualClusterConfig must be specified.
`virtualClusterConfig`	`object (VirtualClusterConfig)` Optional. The virtual cluster config is used when creating a Dataproc cluster that does not directly control the underlying compute resources, for example, when creating a Dataproc-on-GKE cluster. Dataproc may set default values, and values may change when clusters are updated. Exactly one of `config` or `virtualClusterConfig` must be specified.
`labels`	`map (key: string, value: string)` Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a cluster. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`status`	`object (ClusterStatus)` Output only. Cluster status.
`statusHistory[]`	`object (ClusterStatus)` Output only. The previous cluster status.
`clusterUuid`	`string` Output only. A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster.
`metrics`	`object (ClusterMetrics)` Output only. Contains cluster daemon metrics such as HDFS and YARN stats. Beta Feature: This report is available for testing purposes only. It may be changed before final release.

VirtualClusterConfig

The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster.

JSON representation

JSON representation
{ "stagingBucket": string, "auxiliaryServicesConfig": { object (`AuxiliaryServicesConfig`) }, // Union field `infrastructure_config` can be only one of the following: "kubernetesClusterConfig": { object (`KubernetesClusterConfig`) } // End of list of possible types for union field `infrastructure_config`. }

{
  "stagingBucket": string,
  "auxiliaryServicesConfig": {
    object (AuxiliaryServicesConfig)
  },

  // Union field infrastructure_config can be only one of the following:
  "kubernetesClusterConfig": {
    object (KubernetesClusterConfig)
  }
  // End of list of possible types for union field infrastructure_config.
}

Fields
`stagingBucket`	`string` Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets). This field requires a Cloud Storage bucket name, not a `gs://...` URI to a Cloud Storage bucket.
`auxiliaryServicesConfig`	`object (AuxiliaryServicesConfig)` Optional. Configuration of auxiliary services used by this cluster.
Union field `infrastructure_config`. `infrastructure_config` can be only one of the following:
`kubernetesClusterConfig`	`object (KubernetesClusterConfig)` Required. The configuration for running the Dataproc cluster on Kubernetes.

KubernetesClusterConfig

The configuration for running the Dataproc cluster on Kubernetes.

JSON representation

JSON representation
{ "kubernetesNamespace": string, "kubernetesSoftwareConfig": { object (`KubernetesSoftwareConfig`) }, // Union field `config` can be only one of the following: "gkeClusterConfig": { object (`GkeClusterConfig`) } // End of list of possible types for union field `config`. }

{
  "kubernetesNamespace": string,
  "kubernetesSoftwareConfig": {
    object (KubernetesSoftwareConfig)
  },

  // Union field config can be only one of the following:
  "gkeClusterConfig": {
    object (GkeClusterConfig)
  }
  // End of list of possible types for union field config.
}

Fields
`kubernetesNamespace`	`string` Optional. A namespace within the Kubernetes cluster to deploy into. If this namespace does not exist, it is created. If it exists, Dataproc verifies that another Dataproc VirtualCluster is not installed into it. If not specified, the name of the Dataproc Cluster is used.
`kubernetesSoftwareConfig`	`object (KubernetesSoftwareConfig)` Optional. The software configuration for this Dataproc cluster running on Kubernetes.
Union field `config`. `config` can be only one of the following:
`gkeClusterConfig`	`object (GkeClusterConfig)` Required. The configuration for running the Dataproc cluster on GKE.

GkeClusterConfig

The cluster's GKE config.

JSON representation
{ "namespacedGkeDeploymentTarget": { object (`NamespacedGkeDeploymentTarget`) }, "gkeClusterTarget": string, "nodePoolTarget": [ { object (`GkeNodePoolTarget`) } ] }

Fields

Fields
`namespacedGkeDeploymentTarget (deprecated)`	`object (NamespacedGkeDeploymentTarget)` This item is deprecated! Optional. Deprecated. Use gkeClusterTarget. Used only for the deprecated beta. A target for the deployment.
`gkeClusterTarget`	`string` Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
`nodePoolTarget[]`	`object (GkeNodePoolTarget)` Optional. GKE node pools where workloads will be scheduled. At least one node pool must be assigned the `DEFAULT` `GkeNodePoolTarget.Role`. If a `GkeNodePoolTarget` is not specified, Dataproc constructs a `DEFAULT` `GkeNodePoolTarget`. Each role can be given to only one `GkeNodePoolTarget`. All node pools must have the same location settings.

namespacedGkeDeploymentTarget
(deprecated)

object (NamespacedGkeDeploymentTarget)

Optional. Deprecated. Use gkeClusterTarget. Used only for the deprecated beta. A target for the deployment.

gkeClusterTarget

string

Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'

nodePoolTarget[]

object (GkeNodePoolTarget)

Optional. GKE node pools where workloads will be scheduled. At least one node pool must be assigned the DEFAULT GkeNodePoolTarget.Role. If a GkeNodePoolTarget is not specified, Dataproc constructs a DEFAULT GkeNodePoolTarget. Each role can be given to only one GkeNodePoolTarget. All node pools must have the same location settings.

NamespacedGkeDeploymentTarget

Deprecated. Used only for the deprecated beta. A full, namespace-isolated deployment target for an existing GKE cluster.

JSON representation
{ "targetGkeCluster": string, "clusterNamespace": string }

Fields

Fields
`targetGkeCluster`	`string` Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
`clusterNamespace`	`string` Optional. A namespace within the GKE cluster to deploy into.

targetGkeCluster

string

Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'

clusterNamespace

string

Optional. A namespace within the GKE cluster to deploy into.

GkeNodePoolTarget

GKE node pools that Dataproc workloads run on.

JSON representation
{ "nodePool": string, "roles": [ enum (`Role`) ], "nodePoolConfig": { object (`GkeNodePoolConfig`) } }

Fields

Fields
`nodePool`	`string` Required. The target GKE node pool. Format: 'projects/{project}/locations/{location}/clusters/{cluster}/nodePools/{nodePool}'
`roles[]`	`enum (Role)` Required. The roles associated with the GKE node pool.
`nodePoolConfig`	`object (GkeNodePoolConfig)` Input only. The configuration for the GKE node pool. If specified, Dataproc attempts to create a node pool with the specified shape. If one with the same name already exists, it is verified against all specified fields. If a field differs, the virtual cluster creation will fail. If omitted, any node pool with the specified name is used. If a node pool with the specified name does not exist, Dataproc create a node pool with default values. This is an input only field. It will not be returned by the API.

nodePool

string

Required. The target GKE node pool. Format: 'projects/{project}/locations/{location}/clusters/{cluster}/nodePools/{nodePool}'

roles[]

enum (Role)

Required. The roles associated with the GKE node pool.

nodePoolConfig

object (GkeNodePoolConfig)

Input only. The configuration for the GKE node pool.

If specified, Dataproc attempts to create a node pool with the specified shape. If one with the same name already exists, it is verified against all specified fields. If a field differs, the virtual cluster creation will fail.

If omitted, any node pool with the specified name is used. If a node pool with the specified name does not exist, Dataproc create a node pool with default values.

This is an input only field. It will not be returned by the API.

Role

Role specifies the tasks that will run on the node pool. Roles can be specific to workloads. Exactly one GkeNodePoolTarget within the virtual cluster must have the DEFAULT role, which is used to run all workloads that are not associated with a node pool.

Enums
`ROLE_UNSPECIFIED`	Role is unspecified.
`DEFAULT`	At least one node pool must have the `DEFAULT` role. Work assigned to a role that is not associated with a node pool is assigned to the node pool with the `DEFAULT` role. For example, work assigned to the `CONTROLLER` role will be assigned to the node pool with the `DEFAULT` role if no node pool has the `CONTROLLER` role.
`CONTROLLER`	Run work associated with the Dataproc control plane (for example, controllers and webhooks). Very low resource requirements.
`SPARK_DRIVER`	Run work associated with a Spark driver of a job.
`SPARK_EXECUTOR`	Run work associated with a Spark executor of a job.

GkeNodePoolConfig

The configuration of a GKE node pool used by a Dataproc-on-GKE cluster.

JSON representation
{ "config": { object (`GkeNodeConfig`) }, "locations": [ string ], "autoscaling": { object (`GkeNodePoolAutoscalingConfig`) } }

Fields

Fields
`config`	`object (GkeNodeConfig)` Optional. The node pool configuration.
`locations[]`	`string` Optional. The list of Compute Engine zones where node pool nodes associated with a Dataproc on GKE virtual cluster will be located. Note: All node pools associated with a virtual cluster must be located in the same region as the virtual cluster, and they must be located in the same zone within that region. If a location is not specified during node pool creation, Dataproc on GKE will choose the zone.
`autoscaling`	`object (GkeNodePoolAutoscalingConfig)` Optional. The autoscaler configuration for this node pool. The autoscaler is enabled only when a valid configuration is present.

config

object (GkeNodeConfig)

Optional. The node pool configuration.

locations[]

string

Optional. The list of Compute Engine zones where node pool nodes associated with a Dataproc on GKE virtual cluster will be located.

Note: All node pools associated with a virtual cluster must be located in the same region as the virtual cluster, and they must be located in the same zone within that region.

If a location is not specified during node pool creation, Dataproc on GKE will choose the zone.

autoscaling

object (GkeNodePoolAutoscalingConfig)

Optional. The autoscaler configuration for this node pool. The autoscaler is enabled only when a valid configuration is present.

GkeNodeConfig

Parameters that describe cluster nodes.

JSON representation
{ "machineType": string, "localSsdCount": integer, "preemptible": boolean, "accelerators": [ { object (`GkeNodePoolAcceleratorConfig`) } ], "minCpuPlatform": string, "spot": boolean }

Fields
`machineType`	`string` Optional. The name of a Compute Engine machine type.
`localSsdCount`	`integer` Optional. The number of local SSD disks to attach to the node, which is limited by the maximum number of disks allowable per zone (see Adding Local SSDs).
`preemptible`	`boolean` Optional. Whether the nodes are created as legacy preemptible VM instances. Also see `Spot` VMs, preemptible VM instances without a maximum lifetime. Legacy and Spot preemptible nodes cannot be used in a node pool with the `CONTROLLER` role or in the DEFAULT node pool if the CONTROLLER role is not assigned (the DEFAULT node pool will assume the CONTROLLER role).
`accelerators[]`	`object (GkeNodePoolAcceleratorConfig)` Optional. A list of hardware accelerators to attach to each node.
`minCpuPlatform`	`string` Optional. Minimum CPU platform to be used by this instance. The instance may be scheduled on the specified or a newer CPU platform. Specify the friendly names of CPU platforms, such as "Intel Haswell"` or Intel Sandy Bridge".
`spot`	`boolean` Optional. Whether the nodes are created as Spot VM instances. Spot VMs are the latest update to legacy `preemptible VMs`. Spot VMs do not have a maximum lifetime. Legacy and Spot preemptible nodes cannot be used in a node pool with the `CONTROLLER` role or in the DEFAULT node pool if the CONTROLLER role is not assigned (the DEFAULT node pool will assume the CONTROLLER role).

GkeNodePoolAcceleratorConfig

A GkeNodeConfigAcceleratorConfig represents a Hardware Accelerator request for a node pool.

JSON representation
{ "acceleratorCount": string, "acceleratorType": string, "gpuPartitionSize": string }

Fields

Fields
`acceleratorCount`	`string (int64 format)` The number of accelerator cards exposed to an instance.
`acceleratorType`	`string` The accelerator type resource namename (see GPUs on Compute Engine).
`gpuPartitionSize`	`string` Size of partitions to create on the GPU. Valid values are described in the NVIDIA mig user guide.

acceleratorCount

string (int64 format)

The number of accelerator cards exposed to an instance.

acceleratorType

string

The accelerator type resource namename (see GPUs on Compute Engine).

gpuPartitionSize

string

Size of partitions to create on the GPU. Valid values are described in the NVIDIA mig user guide.

GkeNodePoolAutoscalingConfig

GkeNodePoolAutoscaling contains information the cluster autoscaler needs to adjust the size of the node pool to the current cluster usage.

JSON representation
{ "minNodeCount": integer, "maxNodeCount": integer }

Fields

Fields
`minNodeCount`	`integer` The minimum number of nodes in the node pool. Must be >= 0 and <= maxNodeCount.
`maxNodeCount`	`integer` The maximum number of nodes in the node pool. Must be >= minNodeCount, and must be > 0. Note: Quota must be sufficient to scale up the cluster.

minNodeCount

integer

The minimum number of nodes in the node pool. Must be >= 0 and <= maxNodeCount.

maxNodeCount

integer

The maximum number of nodes in the node pool. Must be >= minNodeCount, and must be > 0. Note: Quota must be sufficient to scale up the cluster.

KubernetesSoftwareConfig

The software configuration for this Dataproc cluster running on Kubernetes.

JSON representation
{ "componentVersion": { string: string, ... }, "properties": { string: string, ... } }

Fields

Fields
`componentVersion`	`map (key: string, value: string)` The components that should be installed in this Dataproc cluster. The key must be a string from the KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`properties`	`map (key: string, value: string)` The properties to set on daemon config files. Property keys are specified in `prefix:property` format, for example `spark:spark.kubernetes.container.image`. The following are supported prefixes and their mappings: spark: `spark-defaults.conf` For more information, see Cluster properties. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.

componentVersion

map (key: string, value: string)

The components that should be installed in this Dataproc cluster. The key must be a string from the KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

properties

map (key: string, value: string)

The properties to set on daemon config files.

Property keys are specified in prefix:property format, for example spark:spark.kubernetes.container.image. The following are supported prefixes and their mappings:

spark: spark-defaults.conf

For more information, see Cluster properties.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

AuxiliaryServicesConfig

Auxiliary services configuration for a Cluster.

JSON representation
{ "metastoreConfig": { object (`MetastoreConfig`) }, "sparkHistoryServerConfig": { object (`SparkHistoryServerConfig`) } }

Fields

Fields
`metastoreConfig`	`object (MetastoreConfig)` Optional. The Hive Metastore configuration for this workload.
`sparkHistoryServerConfig`	`object (SparkHistoryServerConfig)` Optional. The Spark History Server configuration for the workload.

metastoreConfig

object (MetastoreConfig)

Optional. The Hive Metastore configuration for this workload.

sparkHistoryServerConfig

object (SparkHistoryServerConfig)

Optional. The Spark History Server configuration for the workload.

SparkHistoryServerConfig

Spark History Server configuration for the workload.

JSON representation
{ "dataprocCluster": string }

Fields

Fields
`dataprocCluster`	`string` Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload. Example: `projects/[projectId]/regions/[region]/clusters/[clusterName]`

dataprocCluster

string

Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example:

projects/[projectId]/regions/[region]/clusters/[clusterName]

ClusterStatus

The status of a cluster and its instances.

JSON representation
{ "state": enum (`State`), "detail": string, "stateStartTime": string, "substate": enum (`Substate`) }

Fields
`state`	`enum (State)` Output only. The cluster's state.
`detail`	`string` Optional. Output only. Details of cluster's state.
`stateStartTime`	`string (Timestamp format)` Output only. Time when this state was entered (see JSON representation of Timestamp). Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`substate`	`enum (Substate)` Output only. Additional state information that includes status reported by the agent.

State

The cluster state.

Enums
`UNKNOWN`	The cluster state is unknown.
`CREATING`	The cluster is being created and set up. It is not ready for use.
`RUNNING`	The cluster is currently running and healthy. It is ready for use. Note: The cluster state changes from "creating" to "running" status after the master node(s), first two primary worker nodes (and the last primary worker node if primary workers > 2) are running.
`ERROR`	The cluster encountered an error. It is not ready for use.
`ERROR_DUE_TO_UPDATE`	The cluster has encountered an error while being updated. Jobs can be submitted to the cluster, but the cluster cannot be updated.
`DELETING`	The cluster is being deleted. It cannot be used.
`UPDATING`	The cluster is being updated. It continues to accept and process jobs.
`STOPPING`	The cluster is being stopped. It cannot be used.
`STOPPED`	The cluster is currently stopped. It is not ready for use.
`STARTING`	The cluster is being started. It is not ready for use.
`SCHEDULED`	Cluster creation is currently waiting for resources to be available. Once all resources are available, it will transition to CREATING and then RUNNING.

Substate

The cluster substate.

Enums

UNSPECIFIED The cluster substate is unknown.

Enums
`UNSPECIFIED`	The cluster substate is unknown.
`UNHEALTHY`	The cluster is known to be in an unhealthy state (for example, critical daemons are not running or HDFS capacity is exhausted). Applies to RUNNING state.
`STALE_STATUS`	The agent-reported status is out of date (may occur if Dataproc loses communication with Agent). Applies to RUNNING state.

UNHEALTHY

The cluster is known to be in an unhealthy state (for example, critical daemons are not running or HDFS capacity is exhausted).

Applies to RUNNING state.

STALE_STATUS

The agent-reported status is out of date (may occur if Dataproc loses communication with Agent).

Applies to RUNNING state.

ClusterMetrics

Contains cluster daemon metrics, such as HDFS and YARN stats.

Beta Feature: This report is available for testing purposes only. It may be changed before final release.

JSON representation
{ "hdfsMetrics": { string: string, ... }, "yarnMetrics": { string: string, ... } }

Fields

Fields
`hdfsMetrics`	`map (key: string, value: string (int64 format))` The HDFS metrics. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`yarnMetrics`	`map (key: string, value: string (int64 format))` YARN metrics. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.

hdfsMetrics

map (key: string, value: string (int64 format))

The HDFS metrics.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

yarnMetrics

map (key: string, value: string (int64 format))

YARN metrics.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

Methods
`create`	Creates a cluster in a project.
`delete`	Deletes a cluster in a project.
`diagnose`	Gets cluster diagnostic information.
`get`	Gets the resource representation for a cluster in a project.
`getIamPolicy`	Gets the access control policy for a resource.
`list`	Lists all regions/{region}/clusters in a project alphabetically.
`patch`	Updates a cluster in a project.
`setIamPolicy`	Sets the access control policy on the specified resource.
`start`	Starts a cluster in a project.
`stop`	Stops a cluster in a project.
`testIamPermissions`	Returns permissions that a caller has on the specified resource.

REST Resource: projects.regions.clusters

Resource: Cluster

VirtualClusterConfig

KubernetesClusterConfig

GkeClusterConfig

NamespacedGkeDeploymentTarget

GkeNodePoolTarget

Role

GkeNodePoolConfig

GkeNodeConfig

GkeNodePoolAcceleratorConfig

GkeNodePoolAutoscalingConfig

KubernetesSoftwareConfig

AuxiliaryServicesConfig

SparkHistoryServerConfig

ClusterStatus

State

Substate

ClusterMetrics

Methods

`create`

`delete`

`diagnose`

`get`

`getIamPolicy`

`list`

`patch`

`setIamPolicy`

`start`

`stop`

`testIamPermissions`