Create a cluster

You can create a Cloud Dataproc cluster via a Cloud Dataproc API clusters.create HTTP or programmatic request, using the Cloud SDK gcloud command-line tool in a local terminal window or in Cloud Shell, or from the Google Cloud Platform Console opened in a local browser.

You can specify a global region or a specific region for your cluster. The global region is a special multi-region endpoint that is capable of deploying instances into any user-specified Compute Engine zone. You can also specify distinct regions, such as us-east1 or europe-west1, to isolate resources (including VM instances and Cloud Storage) and metadata storage locations utilized by Cloud Dataproc within the user-specified region. See Regional endpoints to learn more about the difference between global and regional endpoints. See Available regions & zones for information on selecting a region. You can also run the gcloud compute regions list command to see a listing of available regions.

Compute Engine Virtual Machine instances (VMs) in a Cloud Dataproc cluster, consisting of master and worker VMs, require full internal IP networking access to each other. The default network available (and normally used) to create a cluster helps ensure this access. If you want to create your own network for your Cloud Dataproc cluster, see Cloud Dataproc Cluster Network Configuration.

Creating a Cloud Dataproc cluster

gcloud command

To create a Cloud Dataproc cluster on the command line, run the Cloud SDK gcloud dataproc clusters create command locally in a terminal window or in Cloud Shell.
gcloud dataproc clusters create cluster-name
The above command creates a cluster with default Cloud Dataproc service settings for your master and worker virtual machine instances, disk sizes and types, network type, region and zone where your cluster is deployed, and other cluster settings. See the gcloud dataproc clusters create command for information on using command line flags to customize cluster settings.

Create a cluster with a YAML fileBeta

  1. Run the following gcloud command to export the configuration of an existing Cloud Dataproc cluster into a YAML file.
    gcloud beta dataproc clusters export my-existing-cluster --destination cluster.yaml
    
  2. Create a new cluster by importing the YAML file configuration.
    gcloud beta dataproc clusters import my-new-cluster --source cluster.yaml
    

Note: During the export operation, cluster-specific fields (such as cluster name), output-only fields, and automatically applied labels are filtered. These fields are disallowed in the imported YAML file used to create a cluster.

REST API

Use the Cloud Dataproc clusters.create API to create a cluster. Here is a simple POST request to create a cluster:
POST /v1/projects/my-project/regions/us-central1/clusters/
{
  "projectId": "my-project",
  "clusterName": "cluster-1",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "default",
      "zoneUri": "us-central1-b"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    }
  }
}

Console

Open the Cloud Dataproc Create a cluster page in the GCP Console in your browser.

The above screenshot shows the Create a cluster page with the default fields automatically filled in for a new "cluster-1" cluster. You can expand the Advanced options panel to specify one or more preemptible worker nodes, a staging bucket, network, Cloud Dataproc image version, initialization actions, and project-level access for your cluster. Providing these values is optional.

The default cluster is created with no preemptible worker nodes, an auto-created staging bucket, a default network, and the latest released Cloud Dataproc image version if you do not provide settings for these options.

Once you are satisfied that all fields on the page are filled in correctly, click Create to create the cluster. The cluster name appears in the Clusters page, and its status is updated to "Running" after the cluster is created.

Click the cluster name to open the cluster details page. This page opens with the Overview tab and the CPU utilization graph selected. You can also choose to display network and disk graphs for the cluster.
You can examine jobs, instances, and the configuration settings for your cluster from the other tabs. For example, you can use the VM Instances tab to SSH into the master node of your cluster. You can click Edit in the Configurations tab to edit settings for your cluster—for example, to scale your cluster up or down by changing the number of standard or preemptible worker nodes in your cluster.
Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Cloud Dataproc Documentation
Tarvitsetko apua? Siirry tukisivullemme.