Configure Flex Templates

Stay organized with collections Save and categorize content based on your preferences.

This page documents various Dataflow Flex Template configuration options, including:

  • Permissions
  • Dockerfile environment variables
  • Docker images
  • Pipeline options
  • Staging and temp locations

To configure a sample Flex Template, see the Flex Template tutorial.

Understand Flex Template permissions

When you're working with Flex Templates, there are three sets of permissions to be aware of:

  • Permissions to create resources
  • Permissions to build a Flex Template
  • Permissions to run a Flex Template

Permissions to create resources

To develop and run a Flex Template pipeline, you need to create various resources (for example, a staging bucket). For one-time resource creation tasks, you can use the basic Owner role.

Permissions to build a Flex Template

As the developer of a Flex Template, you need to build the template to make it available to users. Building involves uploading a template spec to a Cloud Storage bucket and provisioning a Docker image with the code and dependencies needed to run the pipeline. To build a Flex Template, you need read and write access to Cloud Storage and read and write access to Container Registry. You can grant these permissions by assigning the following roles:

  • Storage Admin (roles/storage.admin)
  • Cloud Build Editor (roles/cloudbuild.builds.editor)

Note: You can also use Artifact Registry to store your images. To learn about Artifact Registry permissions, see Configuring access control.

Permissions to run a Flex Template

When you run a Flex Template, Dataflow creates a job for you. To create the job, the Dataflow service account needs the following permission:

  • dataflow.serviceAgent

When you first use Dataflow, the service assigns this role for you, so there's no action required to grant the above permission.

By default, the Compute Engine service account is used for launcher VMs and worker VMs. The service account needs the following roles and abilities:

  • Storage Object Admin (roles/storage.objectAdmin)
  • Viewer (roles/viewer)
  • Dataflow Worker (roles/dataflow.worker)
  • Read and write access to the staging bucket
  • Read access to the Flex Template image

To grant read and write access to the staging bucket, you can use the role Storage Object Admin (roles/storage.objectAdmin). For more information, see IAM roles for Cloud Storage.

To grant read access to the Flex Template image, you can use the role Storage Object Viewer (roles/storage.objectViewer). For more information, see Configuring access control.

Set required Dockerfile environment variables

If you want to create your own Docker file for a Flex Template job, you must specify the following environment variables:

Java

You must specify FLEX_TEMPLATE_JAVA_MAIN_CLASS and FLEX_TEMPLATE_JAVA_CLASSPATH in your Dockerfile.

ENV Description Required
FLEX_TEMPLATE_JAVA_MAIN_CLASS Specifies which Java class to run in order to launch the Flex Template. YES
FLEX_TEMPLATE_JAVA_CLASSPATH Specifies the location of class files. YES
FLEX_TEMPLATE_JAVA_OPTIONS Specifies the Java options to be passed while launching the Flex Template. NO

Python

You must specify FLEX_TEMPLATE_PYTHON_PY_FILE in your Dockerfile. You can also set the following in your Dockerfile: FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE, FLEX_TEMPLATE_PYTHON_PY_OPTIONS, FLEX_TEMPLATE_PYTHON_SETUP_FILE and FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES

For example, we set the following environment variables in the Streaming in Python Flex Template tutorial:

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/streaming_beam.py"
ENV Description Required
FLEX_TEMPLATE_PYTHON_PY_FILE Specifies which Python file to run to launch the Flex Template. YES
FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE Specifies the requirement file to install the dependencies for the launch. If you specify FLEX_TEMPLATE_PYTHON_SETUP_FILE, do not set this variable. NO
FLEX_TEMPLATE_PYTHON_SETUP_FILE Specifies the setup file to install the dependencies for the launch. For information on using the setup files, read Multiple File Dependencies. NO
FLEX_TEMPLATE_PYTHON_EXTRA_PACKAGES

Specifies the packages that are not available publicly. For information on how using extra packages, read Local or non-PyPI Dependencies.

NO
FLEX_TEMPLATE_PYTHON_PY_OPTIONS Specifies the Python options to be passed while launching the Flex Template. NO

Choose a base image

You can use a Google-provided base image to package your template container images using Docker. Choose the most recent tag from the Flex Templates base images. It is recommended to use a specific image tag instead of latest.

Specify the base image in the following format:

gcr.io/dataflow-templates-base/IMAGE_NAME:TAG

Replace the following:

Use custom container images

If your pipeline uses a custom container image, we recommend using the custom image as a base image for your Flex Template Docker image. To do so, copy the Flex Template launcher binary from the Google-provided template launcher image onto your custom image. An example Dockerfile:

FROM gcr.io/dataflow-templates-base/IMAGE_NAME:TAG as template_launcher
FROM USER_CUSTOM_IMAGE

COPY --from=template_launcher /opt/google/dataflow/python_template_launcher /opt/google/dataflow/python_template_launcher

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY streaming_beam.py .

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/streaming_beam.py"

ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]

Create a new Dockerfile, specifying the flex template launcher image as parent and add any customizations. For more information on writing Dockerfiles, see Best practices for writing Dockerfiles. For more information, see our guide on using custom containers

Replace the following:

  • IMAGE_NAME: a Google-provided launcher image. The binaries are built for Debian GNU/Linux operating Systems.
  • TAG: a version name for the launcher image, found in the Flex Templates launcher images reference. Avoid using latest and pin to a specific version tag for better stability and troubleshooting.
  • USER_CUSTOM_IMAGE: your custom container image

Use an image from a private registry

You can build a Flex Template image stored in a private Docker registry, provided that the private registry uses HTTPS and has a valid certificate.

To use an image from a private registry, you must specify the path to the image and a username and password for the registry. The username and password must be stored in Secret Manager. You can provide the secret in one of the following formats:

  • projects/{project}/secrets/{secret}/versions/{secret_version}
  • projects/{project}/secrets/{secret}

If you use the second format (that is, if you don't specify the version), Dataflow uses the latest version.

If the registry uses a self-signed certificate, you also need to specify the path to the self-signed certificate in Cloud Storage.

The following table describes the gcloud CLI options you can use to configure a private registry.

Parameter Description
image The address of the registry. For example: gcp.repository.example.com:9082/registry/example/image:latest.
image-repository-username-secret-id The Secret Manager secret ID for the username to authenticate to the private registry. For example: projects/example-project/secrets/username-secret.
image-repository-password-secret-id The Secret Manager secret ID for the password to authenticate to the private registry. For example: projects/example-project/secrets/password-secret/versions/latest.
image-repository-cert-path The full Cloud Storage URL for a self-signed certificate for the private registry. This is only required if the registry uses a self-signed certificate. For example: gs://example-bucket/self-signed.crt.

Here's an example gcloud command to build a Flex Template using an image in a private registry with a self-signed certificate.

gcloud dataflow flex-template build gs://example-bucket/custom-pipeline-private-repo.json
--sdk-language=JAVA
--image="gcp.repository.example.com:9082/registry/example/image:latest"
--image-repository-username-secret-id="projects/example-project/secrets/username-secret"
--image-repository-password-secret-id="projects/example-project/secrets/password-secret/versions/latest"
--image-repository-cert-path="gs://example-bucket/self-signed.crt"
--metadata-file=metadata.json

To build your own Flex Template, you'll need to replace the example values shown above, and you might need to specify different or additional options. To learn more, see the following resources:

Specify pipeline options

For information about pipeline options that are directly supported by Flex Templates, read Pipeline options.

You can also use any Apache Beam pipeline options indirectly. If you're using a metadata.json file for your Flex Template job, include these pipeline options in the file. This metadata file must follow the format in TemplateMetadata. For an example of a metadata.json file, view the streaming SQL Flex Template sample.

Otherwise, when you launch the Flex Template job, pass these pipeline options using the parameters field.

API

Include pipeline options by using the parameters field.

gcloud

Include pipeline options by using the parameters flag.

When passing parameters of List or Map type, it might be necessary to define parameters in a YAML file and use the flags-file. For an example of this approach, view the "Create a file with parameters..." step in this solution.

When using Flex Templates, you can configure some pipeline options during pipeline initialization, but other pipeline options should not be changed. If the command line arguments required by the Flex Template are overwritten, the job might ignore, override, or discard the pipeline options passed by the template launcher. The job might fail to launch, or a job that doesn't use the Flex Template might launch. For more information, see Failed to read the job file.

During pipeline initialization, do not change the following pipeline options:

Java

  • runner
  • project
  • jobName
  • templateLocation
  • region

Python

  • runner
  • project
  • job_name
  • template_location
  • region

Go

  • runner
  • project
  • job_name
  • template_location
  • region

Understand staging location and temp location

The Google Cloud CLI provides --staging-location and --temp-location options when you run a flex template. Similarly, the Dataflow REST API provides stagingLocation and tempLocation fields for FlexTemplateRuntimeEnvironment.

For Flex Templates, the staging location is the Cloud Storage URL that files are written to during the staging step of launching a template. Dataflow reads these staged files to create the template graph. The temp location is the Cloud Storage URL that temporary files are written to during the execution step.

What's next