Get started with Google-provided templates

Google provides a set of open source Dataflow templates. For general information about templates, see the Overview page. To get started, use the WordCount template. See other Google-provided templates:

Streaming templates - Templates for processing data continuously:

Batch templates - Templates for processing data in bulk:

Utility templates:

WordCount

The WordCount template is a batch pipeline that reads text from Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.

Template parameters

Parameter Description
inputFile The Cloud Storage input file's path.
output The Cloud Storage output file's path and prefix.

Running the WordCount template

Console

Run using Google Cloud Console.
  1. Go to the Dataflow Create job from template page.
  2. Go to Create job from template
  3. In the Job name field, enter a unique job name.
  4. Optional: For Regional endpoint, select a value from the drop-down menu. The default regional endpoint is us-central1.

    For a list of regions where you can run a Dataflow job, see Dataflow locations.

  5. From the Dataflow template drop-down menu, select the WordCount template.
  6. In the provided parameter fields, enter your parameter values.
  7. Click Run job.

gcloud

Run using gcloud command-line tool.

When running this template, you need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

Run the following command:

gcloud dataflow jobs run JOB_NAME \
    --gcs-location gs://dataflow-templates/latest/Word_Count \
    --parameters \
    inputFile=gs://dataflow-samples/shakespeare/kinglear.txt,\
    output=gs://BUCKET_NAME/output/my_output

Replace the following:

  • JOB_NAME: a job name of your choice
  • BUCKET_NAME: the name of your Cloud Storage bucket.

API

Run using REST API.

When running this template, you need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

To run this template with a REST API request, send an HTTP POST request with your project ID. This request requires authorization.

POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/templates:launch?gcsPath=gs://dataflow-templates/latest/Word_Count
{
    "jobName": "JOB_NAME",
    "parameters": {
       "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
       "output": "gs://BUCKET_NAME/output/my_output"
    },
    "environment": { "zone": "us-central1-f" }
}

Replace the following:

  • PROJECT_ID: your project ID
  • JOB_NAME: a job name of your choice
  • BUCKET_NAME: the name of your Cloud Storage bucket.