Cloud Dataflow Templates

Cloud Dataflow templates allow you to stage your pipelines on Cloud Storage and execute them from a variety of environments. You can use one of the Google-provided templates or create your own.

Templates provide you with additional benefits compared to traditional Cloud Dataflow deployment, such as:

  • Pipeline execution does not require you to recompile your code every time.
  • You can execute your pipelines without the development environment and associated dependencies that are common with traditional deployment. This is useful for scheduling recurring batch jobs.
  • Runtime parameters allow you to customize the execution of the pipeline.
  • Non-technical users can execute templates with the Google Cloud Platform Console, gcloud command-line tool, or the REST API.

Traditional vs. templated job execution

Cloud Dataflow templates introduce a new development and execution workflow that differs from traditional job execution workflow. The template workflow separates the development step from the staging and execution steps.

Traditional Cloud Dataflow jobs

Apache Beam pipeline development and job execution all happen within a development environment.

Typical workflow for traditional Cloud Dataflow jobs:

  1. Developers create a development environment and develop their pipeline. The environment includes the Apache Beam SDK and other dependencies.
  2. Users execute the pipeline from the development environment. The Apache Beam SDK stages files in Cloud Storage, creates a job request file, and submits the file to the Cloud Dataflow service.

Templated Cloud Dataflow jobs

If you use Cloud Dataflow templates, staging and execution are separate steps. This separation gives you additional flexibility to decide who can run jobs and where the jobs are run from.

Typical workflow for templated Cloud Dataflow jobs:

  1. Developers create a development environment and develop their pipeline. The environment includes the Apache Beam SDK and other dependencies.
  2. Developers execute the pipeline and create a template. The Apache Beam SDK stages files in Cloud Storage, creates a template file (similar to job request), and saves the template file in Cloud Storage.
  3. Non-developer users can easily execute jobs with the GCP Console, gcloud command-line tool, or the REST API to submit template file execution requests to the Cloud Dataflow service.

Before you begin

To create your own templates, make sure your Apache Beam SDK version supports template creation.

Java: SDK 2.x

To create templates with the Cloud Dataflow SDK 2.x for Java, you must have version 2.0.0-beta3 or higher.

Python

To create templates with the Cloud Dataflow SDK 2.x for Python, you must have version 2.0.0 or higher.

Java: SDK 1.x

To create templates with the Cloud Dataflow SDK 1.x for Java, you must have version 1.9.0 or higher.

To execute templates with the gcloud command-line tool, you must have Cloud SDK version 138.0.0 or higher.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow Documentation