Google-Provided Templates

Google provides a set of Google Cloud Dataflow templates. The following templates are available:

WordCount

The WordCount template is a batch job that reads text from Google Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.

Cloud Storage path to template:

    gs://dataflow-templates/wordcount/template_file

Template parameters:

Parameter Description
inputFile The Cloud Storage input file path.
output The Cloud Storage output file path and prefix.

Executing the WordCount template

Note: Executing Google-provided templates with the gcloud command-line tool is not currently supported.

  • Execute from the Google Cloud Platform Console
  • Execute from the REST API

    Use this example request as documented in Using the REST API. This request requires authorization, and you must specify a tempLocation where you have write permissions. You must replace the following values in this example:

    • Replace [YOUR_PROJECT_ID] with your project ID.
    • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
    • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
        POST https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://dataflow-templates/wordcount/template_file
        {
            "jobName": "[JOB_NAME]",
            "parameters": {
               "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
               "output": "gs://[YOUR_BUCKET_NAME]/output/my_output"
            },
            "environment": {
               "tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
               "zone": "us-central1-f"
            }
        }

Cloud Pub/Sub to BigQuery

The Cloud Pub/Sub to BigQuery template is a streaming pipeline that reads JSON strings from a Cloud Pub/Sub topic and converts them to BigQuery TableRow elements. The template is usable as a quick solution to move Cloud Pub/Sub data to BigQuery.

This pipeline requires that:
  • Cloud Pub/Sub files must be in JSON format.
  • BigQuery tables must already exist with the appropriate schemas for the published TableRow elements.

Cloud Storage path to template:

    gs://dataflow-templates/pubsub-to-bigquery/template_file

Template parameters:

Parameter Description
topic The Cloud Pub/Sub input topic to read from.
table The BigQuery output table location.

Executing the Cloud Pub/Sub to BigQuery template

Note: Executing Google-provided templates with the gcloud command-line tool is not currently supported.

  • Execute from the Google Cloud Platform Console
  • Execute from the REST API

    Use this example request as documented in Using the REST API. This request requires authorization, and you must specify a tempLocation where you have write permissions. You must replace the following values in this example:

    • Replace [YOUR_PROJECT_ID] with your project ID.
    • Replace [JOB_NAME] with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
    • Replace [YOUR_TOPIC_NAME] with your Cloud Pub/Sub topic name.
    • Replace [YOUR_DATASET] with your BigQuery dataset, and replace [YOUR_TABLE_NAME] with your BigQuery table name.
    • Replace [YOUR_BUCKET_NAME] with the name of your Cloud Storage bucket.
        POST https://dataflow.googleapis.com/v1b3/projects/[YOUR_PROJECT_ID]/templates:launch?gcsPath=gs://dataflow-templates/pubsub-to-bigquery/template_file
        {
           "jobName": "[JOB_NAME]",
           "parameters": {
               "topic": "projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME]",
               "table": "[YOUR_PROJECT_ID]:[YOUR_DATASET].[YOUR_TABLE_NAME]"
           },
           "environment": {
               "tempLocation": "gs://[YOUR_BUCKET_NAME]/temp",
               "zone": "us-central1-f"
           }
        }

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow Documentation