Get started with Google-provided templates

Google provides a set of open source Dataflow templates. For general information about templates, see the Overview page. To get started, use the WordCount template. See other Google-provided templates:

Streaming templates - Templates for processing data continuously:

Batch templates - Templates for processing data in bulk:

Utility templates:

WordCount

The WordCount template is a batch pipeline that reads text from Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.

Template parameters

Parameter Description
inputFile The Cloud Storage input file's path.
output The Cloud Storage output file's path and prefix.

Running the WordCount template

Console

Run using Google Cloud Console.
  1. Go to the Dataflow page in the Cloud Console.
  2. Go to the Dataflow page
  3. Click Create job from template.
  4. Cloud Console Create Job From Template Button
  5. Select the WordCount template from the Dataflow template drop-down menu.
  6. Enter a job name in the Job Name field.
  7. Enter your parameter values in the provided parameter fields.
  8. Click Run Job.

gcloud

Run using gcloud command-line tool.

When running this template, you need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

Run the following command:

gcloud dataflow jobs run JOB_NAME \
    --gcs-location gs://dataflow-templates/latest/Word_Count \
    --parameters \
    inputFile=gs://dataflow-samples/shakespeare/kinglear.txt,\
    output=gs://BUCKET_NAME/output/my_output

Replace the following:

  • JOB_NAME: a job name of your choice
  • BUCKET_NAME: the name of your Cloud Storage bucket.

API

Run using REST API.

When running this template, you need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

To run this template with a REST API request, send an HTTP POST request with your project ID. This request requires authorization.

POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/templates:launch?gcsPath=gs://dataflow-templates/latest/Word_Count
{
    "jobName": "JOB_NAME",
    "parameters": {
       "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
       "output": "gs://BUCKET_NAME/output/my_output"
    },
    "environment": { "zone": "us-central1-f" }
}

Replace the following:

  • PROJECT_ID: your project ID
  • JOB_NAME: a job name of your choice
  • BUCKET_NAME: the name of your Cloud Storage bucket.