The VPC subnetwork that is used to execute Dataproc Serverless for Spark workloads must meet the following requirements:
Open subnet connectivity: The subnet must allow subnet communication on all ports. The following gcloud command attaches a network firewall to a subnet that allows ingress communications using all protocols on all ports:
gcloud compute firewall-rules create allow-internal-ingress \ --network=network-name \ --source-ranges=SUBNET_RANGES \ --destination-ranges=SUBNET_RANGES \ --direction=ingress \ --action=allow \ --rules=all
- SUBNET_RANGES: See
Allow internal ingress connections between VMs.
The
default
VPC network in a project with thedefault-allow-internal
firewall rule, which allows ingress communication on all ports (tcp:0-65535, udp:0-65535, and icmp protocols:ports), meets the open-subnet-connectivity requirement. However, it also allows ingress by any VM instance on the network.
- SUBNET_RANGES: See
Allow internal ingress connections between VMs.
The
Private Google Access: The subnet must have Private Google Access enabled.
- External network access. If your workload requires external network or internet access, you can set up Cloud NAT to allow outbound traffic using internal IPs on your VPC network.
Dataproc Serverless and VPC-SC networks
With VPC Service Controls, administrators can define a security perimeter around resources of Google-managed services to control communication to and between those services.
Note the following limitations and strategies when using VPC-SC networks with Dataproc Serverless:
To install dependencies outside the VPC-SC perimeter, create a custom container image that pre-installs the dependencies, then submit a Spark batch workload that uses your custom container image.