Executing Templates

After you create and stage your Google Cloud Dataflow template, execute it with the Google Cloud Platform Console, REST API, or gcloud command-line tool. Cloud Dataflow template jobs are deployable from many environments, including Google App Engine standard environment, Google Cloud Functions, and other constrained environments.

Note: In addition to the template file, templated pipeline execution also relies on files that were staged and referenced at the time of template creation. If the staged files are moved or removed, your pipeline execution will fail.

Using the Cloud Platform Console

You can use the Cloud Platform Console to execute Google-provided and custom Cloud Dataflow templates.

Google-provided templates

To execute a Google-provided template:

  1. Go to the Cloud Dataflow page in the Cloud Platform Console.
  2. Go to the Cloud Dataflow page
  3. Click CREATE JOB FROM TEMPLATE.
  4. Cloud Platform Console Create Job From Template Button
  5. Select the Google-provided template that you want to execute from the Cloud Dataflow template drop-down menu.
  6. WordCount Template Execution Form
  7. Enter a job name in the Job Name field. Your job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  8. Enter your parameter values in the provided parameter fields. You should not need the Additional Parameters section when you use a Google-provided template.
  9. Click Run Job.

Custom templates

To execute a custom template:

  1. Go to the Cloud Dataflow page in the Cloud Platform Console.
  2. Go to the Cloud Dataflow page
  3. Click CREATE JOB FROM TEMPLATE.
  4. Cloud Platform Console Create Job From Template Button
  5. Select Custom Template from the Cloud Dataflow template drop-down menu.
  6. Custom Template Execution Form
  7. Enter a job name in the Job Name field. Your job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  8. Enter the Cloud Storage path to your template file in the template Cloud Storage path field.
  9. If your template needs parameters, click on Add item in the Additional Parameters section. Enter in the Name and Value of the parameter. Repeat this step for each needed parameter.
  10. Click Run Job.

Using the REST API

To execute a template with a REST API request, send an HTTP POST request with your project ID. This request requires authorization.

See the REST API reference for projects.templates.launch to learn more about the available parameters.

Note: To run a Google-provided template, you must specify a tempLocation where you have write permissions. Set gcsPath and parameters to the template's location and parameters as documented in Google-Provided Templates.

Example 1: Custom template, batch job

This example projects.templates.launch request creates a batch job from a template that reads a text file and writes an output text file. If the request is successful, the response body contains an instance of LaunchTemplateResponse.

You must modify the following values:

  • Replace [YOUR_PROJECT_ID] with your project ID.
  • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
  • Set gcsPath to the Cloud Storage location of the template file.
  • Set parameters to your list of key/value pairs.
  • Set tempLocation to a location where you have write permission. This value is required to run Google-provided templates.
    POST https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://[YOUR_BUCKET_NAME]/templates/TemplateName
    {
        "jobName": "[JOB_NAME]",
        "parameters": {
            "inputFile" : "gs://[YOUR_BUCKET_NAME]/input/my_input.txt",
            "outputFile": "gs://[YOUR_BUCKET_NAME]/output/my_output"
        },
        "environment": {
            "tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
            "zone": "us-central1-f"
        }
    }

Example 2: Custom template, streaming job

This example projects.templates.launch request creates a streaming job from a template that reads from a Pub/Sub topic and writes to a BigQuery table. The BigQuery table must already exist with the appropriate schema. If successful, the response body contains an instance of LaunchTemplateResponse.

You must modify the following values:

  • Replace [YOUR_PROJECT_ID] with your project ID.
  • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
  • Replace [YOUR_TOPIC_NAME] with your Cloud Pub/Sub topic name.
  • Replace [YOUR_DATASET] with your BigQuery dataset, and replace [YOUR_TABLE_NAME] with your BigQuery table name.
  • Set gcsPath to the Cloud Storage location of the template file.
  • Set parameters to your list of key/value pairs.
  • Set tempLocation to a location where you have write permission. This value is required to run Google-provided templates.
    POST https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://[YOUR_BUCKET_NAME]/templates/TemplateName
    {
        "jobName": "[JOB_NAME]",
        "parameters": {
            "topic": "projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME]",
            "table": "[YOUR_PROJECT_ID]:[YOUR_DATASET].[YOUR_TABLE_NAME]"
        },
        "environment": {
            "tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
            "zone": "us-central1-f"
        }
    }

Using the Google API Client Libraries

Consider using the Google API Client Libraries to easily make calls to the Cloud Dataflow REST APIs. This sample script uses the Google API Client Library for Python.

In this example, you must modify the following values:

  • Replace [YOUR_PROJECT_ID] with your project ID.
  • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
  • Replace [YOUR_TEMPLATE_NAME] with the name of your template.
  • Set gcsPath to the Cloud Storage location of the template file.
  • Set parameters to your list of key/value pairs.
  • Set tempLocation to a location where you have write permission. This value is required to run Google-provided templates.
    from googleapiclient.discovery import build
    from oauth2client.client import GoogleCredentials

    credentials = GoogleCredentials.get_application_default()
    service = build('dataflow', 'v1b3', credentials=credentials)

    # Set the following variables to your values.
    JOBNAME = '[JOB_NAME]'
    PROJECT = '[YOUR_PROJECT_ID]'
    BUCKET = '[YOUR_BUCKET_NAME]'
    TEMPLATE = '[YOUR_TEMPLATE_NAME]'

    GCSPATH="gs://{bucket}/templates/{template}".format(bucket=BUCKET, template=TEMPLATE),
    BODY = {
        "jobName": "{jobname}".format(jobname=JOBNAME),
        "parameters": {
            "inputFile" : "gs://{bucket}/input/my_input.txt",
            "outputFile": "gs://{bucket}/output/my_output".format(bucket=BUCKET)
         },
         "environment": {
            "tempLocation": "gs://{bucket}/temp".format(bucket=BUCKET),
            "zone": "us-central1-f"
         }
    }

    request = service.projects().templates().launch(projectId=PROJECT, gcsPath=GCSPATH, body=BODY)
    response = request.execute()

    print(response)

Using gcloud

Note: To use the gcloud command-line tool to execute templates, you must have Cloud SDK version 138.0.0 or higher.

To execute a custom template with the gcloud command-line tool, use the gcloud beta dataflow jobs run command.

Note: Executing Google-provided templates with the gcloud command-line tool is not currently supported.

For the following examples, set the following values:

  • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
  • You must include the --gcs-location flag. Set --gcs-location to the Cloud Storage location of the template file.
  • Set --parameters to the comma-separated list of parameters to pass to the job. Spaces between commas and values are not allowed.

Example 1: Custom template, batch job

This example creates a batch job from a template that reads a text file and writes an output text file.

    gcloud beta dataflow jobs run [JOB_NAME] \
        --gcs-location gs://[YOUR_BUCKET_NAME]/templates/MyTemplate \
        --parameters inputFile=gs://[YOUR_BUCKET_NAME]/input/my_input.txt,outputFile=gs://[YOUR_BUCKET_NAME]/output/my_output

The request returns a response with the following format.

    id: 2016-10-11_17_10_59-1234530157620696789
    projectId: [YOUR_PROJECT_ID]
    type: JOB_TYPE_BATCH

Example 2: Custom template, streaming job

This example creates a streaming job from a template that reads from a Cloud Pub/Sub topic and writes to a BigQuery table. The BigQuery table must already exist with the appropriate schema.

    gcloud beta dataflow jobs run [JOB_NAME] \
        --gcs-location gs://[YOUR_BUCKET_NAME]/templates/MyTemplate \
        --parameters topic=projects/project-identifier/topics/resource-name,table=my_project:my_dataset.my_table_name

The request returns a response with the following format.

    id: 2016-10-11_17_10_59-1234530157620696789
    projectId: [YOUR_PROJECT_ID]
    type: JOB_TYPE_STREAMING

For a complete list of flags for the gcloud beta dataflow jobs run command, see the gcloud tool reference.

Monitoring and Troubleshooting

The Dataflow Monitoring Interface allows you to monitor your Cloud Dataflow jobs. If a job fails, you can find troubleshooting tips, debugging strategies, and a catalog of common errors in the Troubleshooting Your Pipeline guide.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow Documentation