Configuring internet access and firewall rules

This page explains how to provide routes and define your Google Cloud firewall rules for the network associated with your Dataflow jobs.

Internet access for Dataflow

Dataflow worker virtual machines (VMs) need to be able to reach Google Cloud APIs and services. You can either configure worker VMs with an external IP address so that they meet the Internet access requirements, or you can use Private Google Access.

With Private Google Access, VMs that have only internal IP addresses can access select public IPs for Google Cloud and services. Read Configuring Private Google Access for information about the routing and firewall rules requirements and configuration steps.

Jobs that access APIs and services outside of Google Cloud require internet access. For example, Python SDK jobs need access to the Python Package Index (PyPI). In this case, you must either configure worker VMs with external IP addresses or use a Network Address Translation solution, such as Cloud NAT. Read Managing Python Pipeline Dependencies on the Apache Beam website for more details.

Domain Name System (DNS) limitations

Custom BIND is not supported when using Dataflow. To customize DNS resolution when using Dataflow with VPC Service Controls, use Cloud DNS private zones instead of using custom BIND servers. To use your own on-premises DNS resolution, consider using a Google Cloud DNS forwarding method.

Firewall rules

Firewall rules let you allow or deny traffic to and from your VMs. This page assumes familiarity with how Google Cloud firewall rules work as described on the Firewall Rules Overview and Using Firewall Rules pages, including the implied firewall rules.

Firewall rules required by Dataflow

Dataflow requires that worker VMs communicate with one another using specific TCP ports within the VPC network that you specify in your pipeline options. You need to configure firewall rules in your VPC network to allow this type of communication.

Some VPC networks, like the automatically created default network, include a default-allow-internal rule that meets the firewall requirement for Dataflow.

You can create a more specific firewall rule for Dataflow because all worker VMs have a network tag with the value dataflow. A project owner, editor, or security admin can use the following gcloud command to create an ingress allow rule that permits traffic on TCP ports 12345 and 12346 from VMs with the network tag dataflow to other VMs with the same tag:

gcloud compute firewall-rules create FIREWALL_RULE_NAME \
    --network NETWORK \
    --action allow \
    --direction ingress \
    --target-tags dataflow \
    --source-tags dataflow \
    --priority 0 \
    --rules tcp:12345-12346

Replace the following:

  • FIREWALL_RULE_NAME: a name for the firewall rule
  • NETWORK: the name of the network that your worker VMs use

For further guidance about firewall rules, refer to Using firewall rules. For specific TCP ports used by Dataflow, you can view the project container manifest. The container manifest explicitly specifies the ports in order to map host ports into the container. You can also view network configuration and activity by opening a SSH session on one of your workers and running iproute2. Read the iproute2 page for more information.

Specifying network tags

Network tags are text attributes that you can attach to Compute Engine virtual machine (VM) instances. Tags let you make firewall rules and routes applicable to specific VM instances. Dataflow supports adding network tags to all worker VMs that execute a particular Dataflow job.

Enabling network tags

You can only set tags in the creation phase of your job. Once you create a Dataflow job, you cannot update the tags of a running batch or streaming pipeline.


Specify the following parameter:


Replace TAG-NAME with the names of your tags. If you add more than one tag, separate each tag with a ";", as shown in the following format: TAG-NAME-1;TAG-NAME-2;TAG-NAME-3;....


Specify the following parameter:


Replace TAG-NAME with the names of your tags. If you add more than one tag, separate each tag with a ";", as shown in the following format: TAG-NAME-1;TAG-NAME-2;TAG-NAME-3;....

Even if you do not use this parameter, Dataflow always adds the network tag dataflow to every worker VM it creates.


The following limits apply to network tags:

Limit Value Description
Maximum number of tags per VM 64 All tags for a VM must be unique. Because Dataflow always adds the dataflow tag, you can assign up to 63 additional tags per VM.
Maximum number of characters for each tag 63
Acceptable characters for a tag lowercase letters, numbers, dashes Additionally:
• Tags must start with a lowercase letter.
• Tags must end with either a number or a lowercase letter.

SSH access to worker VMs

Dataflow does not require SSH; however, SSH is useful for troubleshooting.

If your worker VM has an external IP address, you can connect to the VM through either the Cloud Console or by using gcloud command-line tool. To connect using SSH, you must have a firewall rule that allows incoming connections on TCP port 22 from at least the IP address of the system on which you're running gcloud or the system running the web browser you use to access the Cloud Console.

If you need to connect to a worker VM that only has an internal IP address, see Connecting to instances that do not have external IP addresses.