Cloud Dataproc Cluster Network Configuration

Overview

The Google Compute Engine Virtual Machine instances in a Google Cloud Dataproc cluster, consisting of master and worker VMs, require full internal IP networking access to each other.

Legacy networks use a default firewall rule with a source IP range of 10.0.0.0/8 to allow intra-cluster communication, while newer subnetwork-enabled default networks use a slightly more restrictive source IP range of 10.128.0.0/9 due to having constrained IP ranges per regional subnetwork.

In addition to using the source IP ranges just noted, the typical (and default-allow-internal) firewall rule in a Cloud Dataproc cluster's Google Compute Engine network opensudp:0-65535;tcp:0-65535;icmp ports.

Cloud Dataproc cluster default network configuration

When you create a Cloud Dataproc cluster, you can accept the default network for the cluster.

Default network

Here's a Google Cloud Platform Console snapshot that shows the default network selected from the Cloud Dataproc Create a cluster page.

After the cluster is created, the GCP Console VM instances→Network details page shows the default firewall rules for the instances in the cluster, which include the default-allow-internal firewall rule that opens the udp:0-65535;tcp:0-65535;icmp ports.

Note that although legacy-network firewall rules specify a 10.0.0.0/8 IP address range, newer subnetwork-enabled networks provide more constrained regional IP address ranges (the default-allow-internal rule, shown below, specifies a 10.128.0.0/9 IP address range).

Create a VPC network

You can specify your own VPC (Virtual Private Cloud) network when you create a Cloud Dataproc cluster. To do this, you must first create a VPC network with firewall rules. Then, when you create the cluster, you associate your network with the cluster.

Creating a VPC network

You can create a VPC network from the GCP Console or using the gcloud compute networks create command-line tool. You can create an auto mode VPC network or a custom mode VPC network (called "auto" and "custom" networks, respectively, below). An auto network is automatically configured with subnets in each Compute Engine region. Custom networks are not automatically configured with subnets; you must create one or more subnets in one or more Compute Engine regions when you create the custom network. For more information, see Types of VPC Networks.

Let's look at the options available when you create an auto and custom network from the GCP Console.

Auto

The the GCP Console screenshot, below, shows the GCP Console fields that are populated for the Automatic creation of subnetworks (an auto mode VPC network). You must select one or more firewall rules. The network-name-allow-internal rule, which opens udp:0-65535;tcp:0-65535;icmp ports, should be selected to enable full internal IP networking access among VM instances in the network. You can also select the network-name-allow-ssh rule to open standard SSH port 22 to allow SSH connections to network.

Custom

If you choose Custom subnetworks when creating a network (a custom mode VPC network), you must specify the region and private IP address range for each subnetwork. To enable full internal access among VMs in the network, you can specify an IP address range of 10.0.0.0/8 (or a more restrictive range if appropriate, such as 10.128.0.0/16).
Note that you provide firewall rules for custom subnetworks after the network is created. Again, to enable full network access among VMs in your network, select or create a firewall rule that opens the udp:0-65535;tcp:0-65535;icmp ports (as shown in the GCP Console screenshot below).

Creating a cluster that uses your VPC network

gcloud command

You can use the Cloud SDK gcloud dataproc clusters create command with the ‑‑network or ‑‑subnet flag to create a cluster that will use an auto or custom subnetwork.

Using the ‑‑network flag
You can use the ‑‑network flag to create a cluster that will use a subnetwork with the same name as the network in the region where the cluster will be created.

gcloud beta dataproc clusters create my-cluster \
  --network network-name
  other args

For example, since auto networks are created with subnets in each region with the same name as the auto network, you can pass the auto network name to the ‑‑network flag (‑‑network auto-net-name) to create a cluster that will use the auto subnetwork in the cluster's region.

Using the ‑‑subnet flag
You can use the ‑‑subnet flag to create a cluster that will use an auto or custom subnetwork in the region where the cluster will be created. You must pass the ‑‑subnet flag the full resource path of the subnet your cluster will use.

gcloud beta dataproc clusters create cluster-name \
  --subnet projects/project-id/region/region/subnetworks/subnetwork-name
  other args

REST API

You can specify either the networkUri or subnetworkUri GceClusterConfig field as part of a clusters.create request.

Example

POST /v1/projects/my-project-id/regions/global/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "example-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "custom-subnet-1",
      "zoneUri": "us-central1-b"
    },
    ...

Console

After creating a VPC network with firewall rules that allow VMs full access over the network's private IP address range, you can create a cluster from the GCP Console→Create cluster page, then select your network from the Network selector (expand the Preemptible workers, bucket, network, version, initialization, & access options heading to access the selector). After you choose the network, the Subnetwork selector displays the subnetworks(s) available in the region you have selected for the creation of the cluster. If a subnetwork is not available in the region, "No subnetworks in this region" is displayed in the Subnetwork selector.

Below is a screenshot that shows the Network and Subnetwork selectors on the Cloud Dataproc Create a cluster GCP Console page. As shown, a custom subnetwork in a custom network has been selected.

Creating a cluster that uses a VPC network in another project

A Cloud Dataproc cluster can use a Shared VPC network that is owned by a VPC "host project." To use the shared network, the project that will include your Cloud Dataproc cluster (the "service project") must be given necessary IAM permissions in the host project. Here's how to set these permissions in the host project using the GCP Console:

  1. If you haven't already done so, set up the project in which you plan to create your Cloud Dataproc cluster as a "service project" linked to the "host project" containing the VPC network.

  2. Navigate to the Settings tab of the IAM & admin page.

  3. Use the project dropdown at the top of the page, and select the project the Cloud Dataproc cluster will be created in (the "service project").

  4. Take note of the project number, which you will use in the next steps.

  5. Navigate to the IAM tab of the IAM & admin page.

  6. Use the project dropdown at the top of the page, and select the project that contains the Shared VPC network (the "host project").

  7. Click ADD at the top to add a service account.

  8. In the members box, insert #########@cloudservices.gserviceaccount.com, where ######### is the project number from step 3.

  9. In the roles dropdown, select a role with the compute.subnetworks.get and/or compute.networks.get permission (according to whether your cluster will share a subnetwork and/or network). Compute Network User is a default role with these permissions.

  10. Click ADD at the top to add the another service account.

  11. In the members box, insert service-#########@dataproc-accounts.iam.gserviceaccount.com, where ######### is the project number from step 3.

  12. In the roles dropdown, select the role you selected in Step 8, above.

  13. Click ADD to add the service account.*

  14. Follow the instructions for Creating a cluster using the command line using the --subnet and/or --network flags and passing the full (sub)network resource name.

For more information on the topic, see the Shared VPC Overview and the Google Cloud Identity and Access Management Documentation

Create a Cloud Dataproc cluster with internal IP addresses only

You can create a Cloud Dataproc cluster that is isolated from the public Internet whose VM instances communicate over a private IP subnetwork (the VM instances will not have public IP addresses). To do this, the subnetwork of the cluster must have Private Google Access enabled to allow cluster nodes to access Google APIs and services, such as Cloud Storage, from internal IPs.

gcloud command

You can create a Cloud Dataproc cluster with internal IP addresses only by using the gcloud clusters create command with the ‑‑no-address flag.

Using the ‑‑no-address and ‑‑network flags
Use the ‑‑no-address flag with the ‑‑network flag to create a cluster that will use a subnetwork with the same name as the network in the region where the cluster will be created.

gcloud beta dataproc clusters create my-cluster \
  --no-address \
  --network network-name
  other args

For example, since auto networks are created with subnets in each region with the same name as the auto network, you can pass the auto network name to the ‑‑network flag (‑‑network auto-net-name) to create a cluster that will use the auto subnetwork in the cluster's region.

Using the ‑‑no-address and ‑‑subnet flags
Use the ‑‑no-address flag with the ‑‑subnet flags to create a cluster that will use an auto or custom subnetwork in the region where the cluster will be created. You must pass the ‑‑subnet flag the full resource path of the subnet your cluster will use.

gcloud beta dataproc clusters create cluster-name \
  --no-address \
  --subnet projects/project-id/region/region/subnetworks/subnetwork-name
  other args

REST API

You can set the GceClusterConfig internalIpOnly field to "true" as part of a clusters.create request to enable internal IP addresses only.

Example

POST /v1beta2/projects/my-project-id/regions/global/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "example-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "custom-subnet-1",
      "zoneUri": "us-central1-b",
      "internalIpOnly": true
    },
    ...

Console

You can create a Cloud Dataproc cluster with Private Google Access enabled from the Cloud Dataproc Create a cluster GCP Console page. Expand the Preemptible workers, bucket, network, version, initialization, & access options link at the bottom of the page, and then click Internal IP only to enable this feature for your cluster.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation