Dataproc Serverless for Spark network configuration

This document describes the requirements needed for Dataproc Serverless for Spark network configuration.

Virtual Private Cloud subnetwork requirements

The Virtual Private Cloud subnetwork that is used to execute Dataproc Serverless for Spark workloads or interactive sessions must meet the requirements set out in the following subsections.

Private Google Access requirement

The VPC subnet for the region selected for the Dataproc Serverless batch workload or interactive session must have Private Google Access enabled.

External network access: If your workload requires external network or internet access, you can set up Cloud NAT to allow outbound traffic using internal IPs on your VPC network.

Open subnet connectivity requirement

The VPC subnet for the region selected for the Dataproc Serverless batch workload or interactive session must allow internal subnet communication on all ports between VM instances.

The following Google Cloud CLI command attaches a network firewall to a subnet that allows internal ingress communications among VMs using all protocols on all ports:

gcloud compute firewall-rules create allow-internal-ingress \
    --network=NETWORK_NAME \
    --source-ranges=SUBNET_RANGES \
    --destination-ranges=SUBNET_RANGES \
    --direction=ingress \
    --action=allow \
    --rules=all

Notes:

  • SUBNET_RANGES: See Allow internal ingress connections between VMs. The default VPC network in a project with the default-allow-internal firewall rule, which allows ingress communication on all ports (tcp:0-65535, udp:0-65535, and icmp protocols:ports), meets the open-subnet-connectivity requirement. However, this rule also allows ingress by any VM instance on the network.

Dataproc Serverless and VPC-SC networks

With VPC Service Controls, network administrators can define a security perimeter around resources of Google-managed services to control communication to and between those services.

Note the following strategies when using VPC-SC networks with Dataproc Serverless:

For more information, see VPC Service Controls—Dataproc Serverless for Spark.