Google-Provided Templates

Google provides a set of Google Cloud Dataflow templates. The following templates are available:

WordCount

The WordCount template is a batch job that reads text from Google Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.

Cloud Storage path to template:

    gs://dataflow-templates/wordcount/template_file

Template parameters:

Parameter Description
inputFile The Cloud Storage input file path.
output The Cloud Storage output file path and prefix.

Executing the WordCount template

Note: Executing Google-provided templates with gcloud is not currently supported.

  • Execute from the REST API

    Use this example request as documented in Using the REST API. This request requires authorization.

    Note: To run a Google-provided template, you must specify a tempLocation where you have write permissions.

        POST https://dataflow.googleapis.com/v1b3/projects/{YOUR_PROJECT_ID}/templates
        {
            "jobName": "myjobname",
            "gcsPath":"gs://dataflow-templates/wordcount/template_file",
            "parameters": {
               "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
               "output": "gs://{YOUR_BUCKET_NAME}/output/my_output"
            },
            "environment": {
               "tempLocation": "gs://{YOUR_BUCKET_NAME}",
               "zone": "us-central1-f"
            }
        }

Cloud Pub/Sub to BigQuery

The Cloud Pub/Sub to BigQuery template is a streaming pipeline that reads JSON files from a Cloud Pub/Sub topic and writes them to a BigQuery table. The template is usable as a quick solution to move Cloud Pub/Sub data to BigQuery.

This pipeline requires that:
  • Cloud Pub/Sub files must be in JSON format.
  • BigQuery tables must already exist with the appropriate schemas.

Cloud Storage path to template:

    gs://dataflow-templates/pubsub-to-bigquery/template_file

Template parameters:

Parameter Description
topic The Cloud Pub/Sub input topic to read from.
table The BigQuery output table location.

Executing the Cloud Pub/Sub to BigQuery template

Note: Executing Google-provided templates with gcloud is not currently supported.

  • Execute from the REST API

    Use this example request as documented in Using the REST API. This request requires authorization.

    Note: To run a Google-provided template, you must specify a tempLocation where you have write permissions.

        POST https://dataflow.googleapis.com/v1b3/projects/{YOUR_PROJECT_ID}/templates
        {
           "jobName": "myjobname",
           "gcsPath": "gs://dataflow-templates/pubsub-to-bigquery/template_file",
           "parameters": {
               "topic": "projects/project-identifier/topics/resource-name",
               "table": "my_project:my_dataset.my_table_name"
           },
           "environment": {
               "tempLocation": "gs://{YOUR_BUCKET_NAME}",
               "zone": "us-central1-f"
           }
        }

Send feedback about...

Cloud Dataflow Documentation