Cloud Dataproc Cluster Network Configuration

Overview

The Compute Engine Virtual Machine instances in a Cloud Dataproc cluster, consisting of master and worker VMs, require full internal IP networking access to each other.

Legacy networks use a default firewall rule with a source IP range of 10.0.0.0/8 to allow intra-cluster communication, while newer subnetwork-enabled default networks use a slightly more restrictive source IP range of 10.128.0.0/9 due to having constrained IP ranges per regional subnetwork.

In addition to using the source IP ranges just noted, the typical (and default-allow-internal) firewall rule in a Cloud Dataproc cluster's Compute Engine network opensudp:0-65535;tcp:0-65535;icmp ports.

Specify a source IP range for subnet firewall rules

If your subnet firewall rule allows TCP traffic,

gcloud compute firewall-rules create my-subnet-firewall-rule --allow tcp
but doesn't use the --source-ranges or --source-tags flags to specify source IP addresses or source instances that are allowed to connect to the subnet, by default your rule will have a source IP range of 0.0.0.0/0, which opens your subnet to all IP addresses, a critical security vulnerability.

How to set the source IP range

You can set the source IP range when you create a subnet firewall rule from the Google Cloud Platform Console or using the gcloud command-line tool.

Console

Use the GCP Console Create A firewall rule page to create a firewall rule with a specified source IP range.

Gcloud Command

Use the gcloud compute firewall-rules create command to create a firewall rule with a specified source IP range.
gcloud compute firewall-rules create "tcp-rule" --allow tcp:80 \
    --source-ranges="10.0.0.0/22,10.0.0.0/14" \
    --description="Narrowing TCP traffic"

Cloud Dataproc cluster default network configuration

When you create a Cloud Dataproc cluster, you can accept the default network for the cluster.

Default network

Here's a Google Cloud Platform Console snapshot that shows the default network selected from the Cloud Dataproc Create a cluster page.

After the cluster is created, the GCP Console VM instances→Network details page shows the default firewall rules for the instances in the cluster, which include the default-allow-internal firewall rule that opens the udp:0-65535;tcp:0-65535;icmp ports. Note that although legacy-network firewall rules specify a 10.0.0.0/8 IP address range, newer subnetwork-enabled networks provide more constrained regional IP address ranges (the default-allow-internal rule, shown below, specifies a 10.128.0.0/9 IP address range).

Create a VPC network

You can specify your own Virtual Private Cloud (VPC) network when you create a Cloud Dataproc cluster. To do this, you must first create a VPC network with firewall rules. Then, when you create the cluster, you associate your network with the cluster.

Creating a VPC network

You can create a VPC network from the GCP Console or using the gcloud compute networks create command-line tool. You can create an auto mode VPC network or a custom mode VPC network (called "auto" and "custom" networks, respectively, below). An auto network is automatically configured with subnets in each Compute Engine region. Custom networks are not automatically configured with subnets; you must create one or more subnets in one or more Compute Engine regions when you create the custom network. For more information, see Types of VPC Networks.

Let's look at the options available when you create an auto and custom network from the GCP Console.

Auto

The the GCP Console screenshot, below, shows the GCP Console fields that are populated for the Automatic creation of subnetworks (an auto mode VPC network). You must select one or more firewall rules. The network-name-allow-internal rule, which opens udp:0-65535;tcp:0-65535;icmp ports, should be selected to enable full internal IP networking access among VM instances in the network. You can also select the network-name-allow-ssh rule to open standard SSH port 22 to allow SSH connections to network.

Custom

If you choose Custom subnetworks when creating a network (a custom mode VPC network), you must specify the region and private IP address range for each subnetwork. To enable full internal access among VMs in the network, you can specify an IP address range of 10.0.0.0/8 (or a more restrictive range if appropriate, such as 10.128.0.0/16).
Note that you provide firewall rules for custom subnetworks after the network is created. Again, to enable full network access among VMs in your network, select or create a firewall rule that opens the udp:0-65535;tcp:0-65535;icmp ports (as shown in the GCP Console screenshot below).

Creating a cluster that uses your VPC network

gcloud command

You can use the Cloud SDK gcloud dataproc clusters create command with the ‑‑network or ‑‑subnet flag to create a cluster that will use an auto or custom subnetwork.

Using the ‑‑network flag
You can use the ‑‑network flag to create a cluster that will use a subnetwork with the same name as the network in the region where the cluster will be created.

gcloud beta dataproc clusters create my-cluster \
    --network network-name
    ... other args ...

For example, since auto networks are created with subnets in each region with the same name as the auto network, you can pass the auto network name to the ‑‑network flag (‑‑network auto-net-name) to create a cluster that will use the auto subnetwork in the cluster's region.

Using the ‑‑subnet flag
You can use the ‑‑subnet flag to create a cluster that will use an auto or custom subnetwork in the region where the cluster will be created. You must pass the ‑‑subnet flag the full resource path of the subnet your cluster will use.

gcloud beta dataproc clusters create cluster-name \
    --subnet projects/project-id/region/region/subnetworks/subnetwork-name
    ... other args ...

REST API

You can specify either the networkUri or subnetworkUri GceClusterConfig field as part of a clusters.create request.

Example

POST /v1/projects/my-project-id/regions/global/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "example-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "custom-subnet-1",
      "zoneUri": "us-central1-b"
    },
    ...

Console

After creating a VPC network with firewall rules that allow VMs full access over the network's private IP address range, you can create a cluster from the GCP Console→Create cluster page, then select your network from the Network selector (expand the Preemptible workers, bucket, network, version, initialization, & access options heading to access the selector). After you choose the network, the Subnetwork selector displays the subnetworks(s) available in the region you have selected for the creation of the cluster. If a subnetwork is not available in the region, "No subnetworks in this region" is displayed in the Subnetwork selector.

Below is a screenshot that shows the Network and Subnetwork selectors on the Cloud Dataproc Create a cluster GCP Console page. As shown, a custom subnetwork in a custom network has been selected.

Creating a cluster that uses a VPC network in another project

A Cloud Dataproc cluster can use a Shared VPC network by participating as a service project. With Shared VPC, the Shared VPC network is defined in a different project, which is called the host project. The host project is made available for use by IAM members in attached service projects. See Shared VPC Overview for background information.

You will create your Cloud Dataproc cluster in a project. In the Shared VPC scenario, this project will be a service project. You will need to reference the project number of this project. Here's one way to find the project number:

  1. Navigate to the IAM & admin page Settings tab.

  2. From the project drop-down list at the top of the page, select the project you will use to create the Cloud Dataproc cluster.

  3. Note the project number:

An IAM member who is a Shared VPC Admin must perform the following steps. See directions for setting up Shared VPC for background information.

  1. Make sure that the Shared VPC host project has been enabled.

  2. Attach the Cloud Dataproc project to the host project.

  3. Configure either or both of the following service accounts to have the Network User role for the host project. Cloud Dataproc will attempt to use the first service account, falling back to the Google APIs service account if required.

  4. Navigate to the IAM tab of the IAM & admin page.

  5. Use the project drop-down list at the top of the page to select the host project.

  6. Click ADD. Repeat these steps to add both service accounts:

    1. Add the service account to the Members field.

    2. From the Roles menu, select Compute Engine > Compute Network User.

    3. Click Add.

Once both service accounts have the Network User role for the host project, follow the instructions for creating a cluster using the command line, using the --subnet and/or --network flags and passing the full network or subnet name.

Create a Cloud Dataproc cluster with internal IP addresses only

You can create a Cloud Dataproc cluster that is isolated from the public internet whose VM instances communicate over a private IP subnetwork (the VM instances will not have public IP addresses). To do this, the subnetwork of the cluster must have Private Google Access enabled to allow cluster nodes to access Google APIs and services, such as Cloud Storage, from internal IPs.

gcloud command

You can create a Cloud Dataproc cluster with internal IP addresses only by using the gcloud clusters create command with the ‑‑no-address flag.

Using the ‑‑no-address and ‑‑network flags
Use the ‑‑no-address flag with the ‑‑network flag to create a cluster that will use a subnetwork with the same name as the network in the region where the cluster will be created.

gcloud beta dataproc clusters create my-cluster \
    --no-address \
    --network network-name \
    ... other args ...

For example, since auto networks are created with subnets in each region with the same name as the auto network, you can pass the auto network name to the ‑‑network flag (‑‑network auto-net-name) to create a cluster that will use the auto subnetwork in the cluster's region.

Using the ‑‑no-address and ‑‑subnet flags
Use the ‑‑no-address flag with the ‑‑subnet flags to create a cluster that will use an auto or custom subnetwork in the region where the cluster will be created. You must pass the ‑‑subnet flag the full resource path of the subnet your cluster will use.

gcloud beta dataproc clusters create cluster-name \
    --no-address \
    --subnet projects/project-id/region/region/subnetworks/subnetwork-name \
    ... other args ...

REST API

You can set the GceClusterConfig internalIpOnly field to "true" as part of a clusters.create request to enable internal IP addresses only.

Example

POST /v1beta2/projects/my-project-id/regions/global/clusters/
{
  "projectId": "my-project-id",
  "clusterName": "example-cluster",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "subnetworkUri": "custom-subnet-1",
      "zoneUri": "us-central1-b",
      "internalIpOnly": true
    },
    ...

Console

You can create a Cloud Dataproc cluster with Private Google Access enabled from the Cloud Dataproc Create a cluster GCP Console page. Expand the Preemptible workers, bucket, network, version, initialization, & access options link at the bottom of the page, and then click Internal IP only to enable this feature for your cluster.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation