Google provides a set of open-source Dataflow templates. For general information about templates, see the Overview page. To get started, use the WordCount template documented in the section below. See other Google-provided templates:
Streaming templates - Templates for processing data continuously:
- Pub/Sub Subscription to BigQuery
- Pub/Sub Topic to BigQuery
- Pub/Sub to Pub/Sub
- Pub/Sub to Splunk
- Pub/Sub to Cloud Storage Avro
- Pub/Sub to Cloud Storage Text
- Cloud Storage Text to BigQuery (Stream)
- Cloud Storage Text to Pub/Sub (Stream)
- Data Masking/Tokenization using Cloud DLP from Cloud Storage to BigQuery (Stream)
- Change Data Capture to BigQuery (Stream)
- Apache Kafka to BigQuery
Batch templates - Templates for processing data in bulk:
- BigQuery to Cloud Storage TFRecords
- Cloud Bigtable to Cloud Storage Avro
- Cloud Bigtable to Cloud Storage SequenceFiles
- Datastore to Cloud Storage Text
- Cloud Spanner to Cloud Storage Avro
- Cloud Spanner to Cloud Storage Text
- Cloud Storage Avro to Cloud Bigtable
- Cloud Storage Avro to Cloud Spanner
- Cloud Storage SequenceFiles to Cloud Bigtable
- Cloud Storage Text to BigQuery
- Cloud Storage Text to Datastore
- Cloud Storage Text to Pub/Sub (Batch)
- Cloud Storage Text to Cloud Spanner
- Java Database Connectivity (JDBC) to BigQuery
- Apache Cassandra to Cloud Bigtable
- Apache Hive to BigQuery
- Apache Cassandra to Cloud Bigtable
- File Format Conversion
Utility templates:
- Bulk Compress Cloud Storage Files
- Bulk Decompress Cloud Storage Files
- Datastore Bulk Delete
- Streaming Data Generator to Pub/Sub
WordCount
The WordCount template is a batch pipeline that reads text from Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.
Template parameters
Parameter | Description |
---|---|
inputFile |
The Cloud Storage input file path. |
output |
The Cloud Storage output file path and prefix. |
Running the WordCount template
Console
Run from the Google Cloud Console- Go to the Dataflow page in the Cloud Console. Go to the Dataflow page
- Click Create job from template.
- Select the WordCount template from the Dataflow template drop-down menu.
- Enter a job name in the Job Name field.
- Enter your parameter values in the provided parameter fields.
- Click Run Job.

GCLOUD
Run from thegcloud
command-line tool
Note: To use the gcloud
command-line tool to run templates, you must have
Cloud SDK version 138.0.0 or higher.
When running this template, you'll need the Cloud Storage path to the template:
gs://dataflow-templates/latest/Word_Count
You must replace the following values in this example:
- Replace JOB_NAME with a job name of your choice.
- Replace YOUR_BUCKET_NAME with the name of your Cloud Storage bucket.
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates/latest/Word_Count \ --parameters \ inputFile=gs://dataflow-samples/shakespeare/kinglear.txt,\ output=gs://YOUR_BUCKET_NAME/output/my_output
API
Run from the REST APIWhen running this template, you'll need the Cloud Storage path to the template:
gs://dataflow-templates/latest/Word_Count
To run this template with a REST API request, send an HTTP POST request with your project ID. This request requires authorization.
You must replace the following values in this example:
- Replace YOUR_PROJECT_ID with your project ID.
- Replace JOB_NAME with a job name of your choice.
- Replace YOUR_BUCKET_NAME with the name of your Cloud Storage bucket.
POST https://dataflow.googleapis.com/v1b3/projects/YOUR_PROJECT_ID/templates:launch?gcsPath=gs://dataflow-templates/latest/Word_Count { "jobName": "JOB_NAME", "parameters": { "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt", "output": "gs://YOUR_BUCKET_NAME/output/my_output" }, "environment": { "zone": "us-central1-f" } }