Get started with Google-provided templates

Google provides a set of open-source Cloud Dataflow templates. For general information about templates, see the Overview page. To get started, use the WordCount template documented in the section below. See other Google-provided templates:

Streaming templates - Templates for processing data continuously:

Batch templates - Templates for processing data in bulk:

Utility templates:

WordCount

The WordCount template is a batch pipeline that reads text from Cloud Storage, tokenizes the text lines into individual words, and performs a frequency count on each of the words. For more information about WordCount, see WordCount Example Pipeline.

Template parameters

Parameter Description
inputFile The Cloud Storage input file path.
output The Cloud Storage output file path and prefix.

Executing the WordCount template

Console

Execute from the Google Cloud Platform Console
  1. Go to the Cloud Dataflow page in the GCP Console.
  2. Go to the Cloud Dataflow page
  3. Click Create job from template.
  4. Cloud Platform Console Create Job From Template Button
  5. Select the WordCount template from the Cloud Dataflow template drop-down menu.
  6. Enter a job name in the Job Name field. Your job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  7. Enter your parameter values in the provided parameter fields.
  8. Click Run Job.

GCLOUD

Execute from the gcloud command-line tool

Note: To use the gcloud command-line tool to execute templates, you must have Cloud SDK version 138.0.0 or higher.

When executing this template, you'll need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

You must replace the following values in this example:

  • Replace YOUR_PROJECT_ID with your project ID.
  • Replace JOB_NAME with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace YOUR_BUCKET_NAME with the name of your Cloud Storage bucket.
gcloud dataflow jobs run JOB_NAME \
    --gcs-location gs://dataflow-templates/latest/Word_Count \
    --parameters \
inputFile=gs://dataflow-samples/shakespeare/kinglear.txt,\
output=gs://YOUR_BUCKET_NAME/output/my_output

API

Execute from the REST API

When executing this template, you'll need the Cloud Storage path to the template:

gs://dataflow-templates/latest/Word_Count

To execute this template with a REST API request, send an HTTP POST request with your project ID. This request requires authorization.

You must replace the following values in this example:

  • Replace YOUR_PROJECT_ID with your project ID.
  • Replace JOB_NAME with a job name of your choice. The job name must match the regular expression [a-z]([-a-z0-9]{0,38}[a-z0-9])? to be valid.
  • Replace YOUR_BUCKET_NAME with the name of your Cloud Storage bucket.
POST https://dataflow.googleapis.com/v1b3/projects/YOUR_PROJECT_ID/templates:launch?gcsPath=gs://dataflow-templates/latest/Word_Count
{
    "jobName": "JOB_NAME",
    "parameters": {
       "inputFile" : "gs://dataflow-samples/shakespeare/kinglear.txt",
       "output": "gs://YOUR_BUCKET_NAME/output/my_output"
    },
    "environment": { "zone": "us-central1-f" }
}
Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...

Hai bisogno di assistenza? Visita la nostra pagina di assistenza.