Google Cloud Dataflow Templates

Cloud Dataflow templates allow you to stage your pipelines on Google Cloud Storage and execute them from a variety of environments. Templates provide you with additional benefits compared to traditional Cloud Dataflow deployment, such as:

  • Pipeline execution does not require you to recompile your code every time.
  • You can execute your pipelines without the development environment and associated dependencies that are common with traditional deployment. This is useful for scheduling regular batch jobs.
  • Runtime parameters allow you to customize the execution.
  • Non-technical users can execute templates with the gcloud command-line tool or the REST API.

Traditional vs. templated job execution

Cloud Dataflow templates introduce a new development and execution workflow that differs from traditional job execution workflow. The template workflow separates the development step from the staging and execution steps.

Traditional Cloud Dataflow jobs

Traditional Cloud Dataflow pipeline development and job execution all happen within a development environment.

Typical workflow:

  1. Developers create a development environment and develop their pipeline. The environment includes the Cloud Dataflow SDK and other dependencies.
  2. Users execute the pipeline from the development environment. The Cloud Dataflow SDK stages files in Cloud Storage, creates a job request file, and submits the file to the Cloud Dataflow service.

Templated Cloud Dataflow jobs

If you use Cloud Dataflow templates, staging and execution are separate steps. This separation gives you additional flexibility to decide who can run jobs and where the jobs are run from.

Typical workflow:

  1. Developers create a development environment and develop their pipeline. The environment includes the Cloud Dataflow SDK and other dependencies.
  2. Developers execute the pipeline with the TemplatingDataflowPipelineRunner. The Cloud Dataflow SDK stages files in Cloud Storage, creates a template file (similar to job request), and saves the template file in Cloud Storage.
  3. Non-developer users can easily execute jobs with the gcloud command-line tool or the REST API to submit template file execution requests to the Cloud Dataflow service.

Before you begin

Templates require the following SDK versions:

  1. To create templates, you must have Cloud Dataflow Java SDK version 1.9.0 or higher.
  2. To execute templates with the gcloud command-line tool, you must have Cloud SDK version 138.0.0 or higher.

What's next

Send feedback about...

Cloud Dataflow Documentation