Problem
When trying to disable Dataflow Shuffle or Streaming Engine, you receive one of the following error messages:
Rpc to <worker-harness>:12345 completed with error UNAVAILABLE: failed to connect to all addresses
java.util.concurrent.ExecutionException: java.io.IOException: DEADLINE_EXCEEDED: (g)RPC timed out when <source-worker-harness> talking to <destination-worker-harness>:12346. Server unresponsive (ping error: Deadline Exceeded
Environment
- Dataflow Shuffle or Streaming Engine is disabled
- Pipeline running with more than one worker
Solution
- You must add a ingress firewall rule to allow network traffic to port 12345-12346 with the following details:
- INGRESS_FIREWALL_RULE_NAME: any unique name. For example: allow-ingress-dataflow
- NETWORK: <network containing subnetwork for the Dataflow job>
- DIRECTION: ingress
$ gcloud compute firewall-rules create INGRESS_FIREWALL_RULE_NAME \ --network NETWORK \ --action allow \ --direction DIRECTION \ --target-tags dataflow \ --source-tags dataflow \ --priority 0 \ --rules tcp:12345-12346
- If the default egress allow rule is blocked, add an egress rule to allow network traffic to port 12345-12346 with the following details:
- EGRESS_FIREWALL_RULE_NAME: any unique name. For example: allow-egress-dataflow
- NETWORK: <network containing subnetwork for the Dataflow job>
- DIRECTION: egress
- CIDR_RANGE : <ip range of subnetwork used by Dataflow job>
$ gcloud compute firewall-rules create EGRESS_FIREWALL_RULE_NAME \ --network NETWORK \ --action allow \ --direction DIRECTION \ --target-tags dataflow \ --destination-ranges CIDR_RANGE \ --priority 0 \ --rules tcp:12345-12346
Cause
Dataflow workers stores the intermediate data locally when Dataflow Shuffle/Streaming Engine is disabled. Some operations (like GroupByKey) need shuffling of the intermediate data between workers and it happens over 12345-12346 ports. Job will get stuck or fail if appropriate firewall rules are not present.