Create a Cluster

You can create a Cloud Dataproc cluster via a Cloud Dataproc API clusters.create request, using the Cloud SDK gcloud command-line tool, or from the Google Cloud Platform Console.

Creating a Cloud Dataproc cluster

gcloud command

To create a Cloud Dataproc cluster on the command line, use the Cloud SDK gcloud dataproc clusters create command.
gcloud dataproc clusters create cluster-name
The above command creates a cluster with default Cloud Dataproc service settings for your master and worker virtual machine instances, disk sizes and types, network type, region and zone where your cluster is deployed, and other cluster settings. See the gcloud dataproc clusters create command for information on using command line flags to customize cluster settings.


Use the Cloud Dataproc clusters.create API to create a cluster. Here is a simple POST request to create a cluster:
POST /v1/projects/my-project/regions/global/clusters/
  "projectId": "my-project",
  "clusterName": "cluster-1",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-b"
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0


Open the Cloud Dataproc Create a cluster page in the GCP Console.

The above screenshot shows the Create a cluster page with the default fields automatically filled in for a new "cluster-1" cluster. You can expand the Preemptible workers, bucket, network, version, initialization, & access options panel to specify one or more preemptible worker nodes, a staging bucket, network, Cloud Dataproc image version, initialization actions, and project-level access for your cluster. Providing these values is optional.

The default cluster is created with no preemptible worker nodes, an auto-created staging bucket (see Auto-created staging bucket, a default network, and the latest released Cloud Dataproc image version if you do not provide settings for these options.

Once you are satisfied that all fields on the page are filled in correctly, click Create to create the cluster. The cluster name appears in the Clusters page, and its status is updated to "Running" after the cluster is created.

Click the cluster name to open the cluster details page. This page opens with the Overview tab and the CPU utilization graph selected. You can also choose to display network and disk graphs for the cluster.
You can examine jobs, instances, and the configuration settings for your cluster from the other tabs. For example, you can use the VM Instances tab to SSH into the master node of your cluster. You can click Edit in the Configurations tab to edit settings for your cluster—for example, to scale your cluster up or down by changing the number of standard or preemptible worker nodes in your cluster.

Auto-created staging bucket

When you create a cluster, Cloud Dataproc creates a Cloud Storage staging bucket in your project or reuses an existing Cloud Dataproc-created bucket from a previous cluster creation request. A separate bucket is used in each geographical region, as determined by the Compute Engine zone of the cluster (a Cloud Dataproc-created staging bucket is shared among clusters in the same region). Staging buckets are used to stage miscellaneous configuration and control files that are needed by your cluster. Staging buckets also receive output from the Cloud SDK gcloud dataproc clusters diagnose command. To list the name of the staging bucket created by Cloud Dataproc, run the gcloud dataproc clusters describe command. The bucket associated with your cluster is listed in the terminal window output next to configurationBucket:

gcloud dataproc clusters describe <cluster-name>
clusterName: your-cluster-name
clusterUuid: daa40b3f-5ff5-4e89-9bf1-bcbfec ...
    configurationBucket: dataproc-edc9d85f-12f9-4905-...

Send feedback about...

Google Cloud Dataproc Documentation