Google Cloud Dataflow Templates

Cloud Dataflow templates allow you to stage your pipelines on Google Cloud Storage and execute them from a variety of environments. You can use one of the Google-provided templates or create your own.

Templates provide you with additional benefits compared to traditional Cloud Dataflow deployment, such as:

  • Pipeline execution does not require you to recompile your code every time.
  • You can execute your pipelines without the development environment and associated dependencies that are common with traditional deployment. This is useful for scheduling recurring batch jobs.
  • Runtime parameters allow you to customize the execution.
  • Non-technical users can execute templates with the Google Cloud Platform Console, gcloud command-line tool, or the REST API.

Traditional vs. templated job execution

Cloud Dataflow templates introduce a new development and execution workflow that differs from traditional job execution workflow. The template workflow separates the development step from the staging and execution steps.

Traditional Cloud Dataflow jobs

Traditional Cloud Dataflow pipeline development and job execution all happen within a development environment.

Typical workflow for traditional Cloud Dataflow jobs:

  1. Developers create a development environment and develop their pipeline. The environment includes the Cloud Dataflow SDK and other dependencies.
  2. Users execute the pipeline from the development environment. The Cloud Dataflow SDK stages files in Cloud Storage, creates a job request file, and submits the file to the Cloud Dataflow service.

Templated Cloud Dataflow jobs

If you use Cloud Dataflow templates, staging and execution are separate steps. This separation gives you additional flexibility to decide who can run jobs and where the jobs are run from.

Typical workflow for templated Cloud Dataflow jobs:

  1. Developers create a development environment and develop their pipeline. The environment includes the Cloud Dataflow SDK and other dependencies.
  2. Developers execute the pipeline and create a template. The Cloud Dataflow SDK stages files in Cloud Storage, creates a template file (similar to job request), and saves the template file in Cloud Storage.
  3. Non-developer users can easily execute jobs with the Cloud Platform Console, gcloud command-line tool, or the REST API to submit template file execution requests to the Cloud Dataflow service.

Before you begin

To create your own templates, make sure your Cloud Dataflow SDK version supports template creation.

Java: SDK 1.x

To create templates with the Cloud Dataflow SDK 1.x for Java, you must have version 1.9.0 or higher.

Java: SDK 2.x

To create templates with the Cloud Dataflow SDK 2.x for Java, you must have version 2.0.0-beta3 or higher.

Python

To create templates with the Cloud Dataflow SDK 2.x for Python, you must have version 2.0.0 or higher.

To execute templates with the gcloud command-line tool, you must have Cloud SDK version 138.0.0 or higher.

What's next

Send feedback about...

Cloud Dataflow Documentation