Specify a network and subnetwork

This document explains how to specify a network or a subnetwork or both options when you run Dataflow jobs.

This document requires that you know how to create Google Cloud networks and subnetworks. This document also requires your familiarity with the network terms discussed in the next section.

Google Cloud network terminology

  • VPC network. A VPC network, sometimes called a network, provides connectivity for resources in a project.

    To learn more about VPC, see VPC network overview.

  • Shared VPC network. A Shared VPC network is one that exists in a separate project, called a host project, within your organization. If a Shared VPC Admin has defined you as a Service Project Admin, you have permission to use at least some of the subnetworks in networks of the host project.

    To learn more about Shared VPC, see Shared VPC overview.

  • VPC Service Controls. Dataflow VPC Service Controls help secure your pipeline's resources and services.

    To learn more about VPC Service Controls, see VPC Service Controls overview. To learn about the limitations when using Dataflow with VPC Service Controls, see supported products and limitations.

Network and subnetwork for a Dataflow job

When you create a Dataflow job, you can specify either a network or a subnetwork or both options.

Consider the following guidelines:

  • If you are unsure about which parameter to use, specify only the subnetwork parameter. The network parameter is then implicitly specified for you.

  • If you omit both the subnetwork and network parameters, Google Cloud assumes you intend to use an auto mode VPC network named default.

  • If you omit both the subnetwork and network parameters and you do not have a network named default in your project, you receive an error.

Guidelines for specifying a network parameter

  • You can select an auto mode network in your project with the network parameter.

  • You can specify a network using only its name and not the complete URL.

  • You can also use the network parameter to select a Shared VPC network only if both of the following conditions are true:

    • The Shared VPC network that you select is an auto mode network.

    • You are a Service Project Admin with project-level permissions to the whole Shared VPC host project. This means that a Shared VPC Admin has granted you the Compute Network User role for the whole host project, so you are able to use all of its networks and subnetworks.

  • For all other cases, you must specify a subnetwork.

Guidelines for specifying a subnetwork parameter

  • If you specify a subnetwork, Dataflow chooses the network for you. Therefore, when specifying a subnetwork, you can omit the network parameter.

  • To select a specific subnetwork in a network, use the subnetwork parameter.

  • Specify a subnetwork using either a complete URL or an abbreviated path.

  • You must select a subnetwork in the same region as the zone where you run your Dataflow workers. For example, you must specify the subnetwork parameter in the following situations:

    • The subnetwork you specify is in a custom mode network.

    • You are a Service Project Admin with subnet-level permissions to a specific subnetwork in a Shared VPC host project.

  • The subnetwork size only limits the number of instances by number of available IP addresses. This sizing does not have impact on Dataflow VPC Service Controls performance.

Guidelines for specifying a subnetwork parameter for Shared VPC

  • When specifying the subnetwork URL for Shared VPC, ensure that HOST_PROJECT_ID is the project in which the VPC is hosted.

  • If the subnetwork is located in a Shared VPC network, you must use the complete URL.

  • Ensure that both the Dataflow service account and the worker service account have the Compute Network User role assigned on the specified subnet. If you do not enable the role, the following error message is displayed: Error: Message: Required 'compute.subnetworks.get' permission.

Example network and subnetwork specifications

Example of a complete URL that specifies a subnetwork:

https://www.googleapis.com/compute/v1/projects/`HOST_PROJECT_ID`/regions/`REGION_NAME`/subnetworks/`SUBNETWORK_NAME`

Replace the following:

  • HOST_PROJECT_ID: the host project ID
  • REGION_NAME: the regional endpoint of your Dataflow job
  • SUBNETWORK_NAME: the name of your Compute Engine subnetwork

The following is an example URL, where the host project ID is my-cloud-project, the region is us-central1, and the subnetwork name is mysubnetwork:

 https://www.googleapis.com/compute/v1/projects/my-cloud-project/regions/us-central1/subnetworks/mysubnetwork"

The following is an example of a short form that specifies a subnetwork:

regions/`REGION_NAME`/subnetworks/`SUBNETWORK_NAME`

Replace the following:

  • REGION_NAME: the regional endpoint of your Dataflow job
  • SUBNETWORK_NAME: the name of your Compute Engine subnetwork

Run your pipeline with the network and subnetwork specified

If you are a Service Project Admin who only has permission to use specific subnetworks in a Shared VPC network, you must specify the subnetwork parameter with a subnetwork that you have permission to use.

The following example shows how to run your pipeline from the command line or by using the REST API. The example specifies a subnetwork. You can also specify the network.

Java

mvn compile exec:java \
    -Dexec.mainClass=com.example.WordCount \
    -Dexec.args="--project=my-cloud-project \
        --stagingLocation=gs://my-wordcount-storage-bucket/staging/ \
        --output=gs://my-wordcount-storage-bucket/output \
        --runner=DataflowRunner \
        --subnetwork=https://www.googleapis.com/compute/v1/projects/my-cloud-project/regions/us-central1/subnetworks/mysubnetwork"

Python

python -m apache_beam.examples.wordcount \
    --project my-cloud-project \
    --runner DataflowRunner \
    --staging_location gs://my-wordcount-storage-bucket/staging \
    --temp_location gs://my-wordcount-storage-bucket/temp \
    --output gs://my-wordcount-storage-bucket/output \
    --subnetwork https://www.googleapis.com/compute/v1/projects/my-cloud-project/regions/us-central1/subnetworks/mysubnetwork

API

If you're running a Dataflow template using the REST API, add network or subnetwork, or both, to the environment object.

POST https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://dataflow-templates/wordcount/template_file
{
    "jobName": "my_job",
    "parameters": {
       "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
       "output": "gs://my-wordcount-storage-bucket/output"
    },
    "environment": {
       "tempLocation": "gs://my-wordcount-storage-bucket/temp",
       "subnetwork": "https://www.googleapis.com/compute/v1/projects/my-cloud-project/regions/us-central1/subnetworks/mysubnetwork",
       "zone": "us-central1-f"
    }
}

Turn off an external IP address

To turn off an external IP address, see Configure internet access and firewall rules.