Dataflow API Connector Overview

The Workflows connector defines the built-in functions that can be used to access other Google Cloud products within a workflow.

This page provides an overview of the individual connector. There is no need to import or load connector libraries in a workflow—connectors work out of the box when used in a call step.

Dataflow API

Manages Google Cloud Dataflow projects on Google Cloud Platform. To learn more, see the Dataflow API documentation.

Dataflow connector sample

YAML

# This workflow demonstrates how to use the Cloud Dataflow connector.
# The workflow creates a word count job using a Dataflow public job template
# and uses a Cloud Storage bucket as temporary storage for temp files.
# The bucket resource is deleted after the job completes.
# Expected successful output: "SUCCESS"

- init:
    assign:
      - project_id: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
      - location: "us-central1"
      - zone: "us-central1-a"
      - bucket_name: "[fill in a bucket name]"
      - job_name: "[fill in a job name]"
      - input_file: "gs://dataflow-samples/shakespeare/kinglear.txt"
      - output_storage_file_prefix: ${"gs://" + bucket_name + "/counts"}
      - temp_location: ${"gs://" + bucket_name + "/counts/temp"}
      - template_path: "gs://dataflow-templates-us-central1/latest/Word_Count"
- create_bucket:
    call: googleapis.storage.v1.buckets.insert
    args:
      project: ${project_id}
      body:
        name: ${bucket_name}
- create_job:
    call: googleapis.dataflow.v1b3.projects.locations.templates.create
    args:
      projectId: ${project_id}
      location: ${location}
      body:
        jobName: ${job_name}
        parameters:
          inputFile: ${input_file}
          output: ${output_storage_file_prefix}
        environment:
          numWorkers: 1
          maxWorkers: 2
          zone: ${zone}
          tempLocation: ${temp_location}
        gcsPath: ${template_path}
- delete_bucket_object1:
    call: googleapis.storage.v1.objects.delete
    args:
      bucket: ${bucket_name}
      object: ${"counts-00000-of-00003"}
- delete_bucket_object2:
    call: googleapis.storage.v1.objects.delete
    args:
      bucket: ${bucket_name}
      object: ${"counts-00001-of-00003"}
- delete_bucket_object3:
    call: googleapis.storage.v1.objects.delete
    args:
      bucket: ${bucket_name}
      object: ${"counts-00002-of-00003"}
- delete_bucket:
    call: googleapis.storage.v1.buckets.delete
    args:
      bucket: ${bucket_name}
- the_end:
    return: "SUCCESS"

JSON

[
  {
    "init": {
      "assign": [
        {
          "project_id": "${sys.get_env(\"GOOGLE_CLOUD_PROJECT_ID\")}"
        },
        {
          "location": "us-central1"
        },
        {
          "zone": "us-central1-a"
        },
        {
          "bucket_name": "[fill in a bucket name]"
        },
        {
          "job_name": "[fill in a job name]"
        },
        {
          "input_file": "gs://dataflow-samples/shakespeare/kinglear.txt"
        },
        {
          "output_storage_file_prefix": "${\"gs://\" + bucket_name + \"/counts\"}"
        },
        {
          "temp_location": "${\"gs://\" + bucket_name + \"/counts/temp\"}"
        },
        {
          "template_path": "gs://dataflow-templates-us-central1/latest/Word_Count"
        }
      ]
    }
  },
  {
    "create_bucket": {
      "call": "googleapis.storage.v1.buckets.insert",
      "args": {
        "project": "${project_id}",
        "body": {
          "name": "${bucket_name}"
        }
      }
    }
  },
  {
    "create_job": {
      "call": "googleapis.dataflow.v1b3.projects.locations.templates.create",
      "args": {
        "projectId": "${project_id}",
        "location": "${location}",
        "body": {
          "jobName": "${job_name}",
          "parameters": {
            "inputFile": "${input_file}",
            "output": "${output_storage_file_prefix}"
          },
          "environment": {
            "numWorkers": 1,
            "maxWorkers": 2,
            "zone": "${zone}",
            "tempLocation": "${temp_location}"
          },
          "gcsPath": "${template_path}"
        }
      }
    }
  },
  {
    "delete_bucket_object1": {
      "call": "googleapis.storage.v1.objects.delete",
      "args": {
        "bucket": "${bucket_name}",
        "object": "${\"counts-00000-of-00003\"}"
      }
    }
  },
  {
    "delete_bucket_object2": {
      "call": "googleapis.storage.v1.objects.delete",
      "args": {
        "bucket": "${bucket_name}",
        "object": "${\"counts-00001-of-00003\"}"
      }
    }
  },
  {
    "delete_bucket_object3": {
      "call": "googleapis.storage.v1.objects.delete",
      "args": {
        "bucket": "${bucket_name}",
        "object": "${\"counts-00002-of-00003\"}"
      }
    }
  },
  {
    "delete_bucket": {
      "call": "googleapis.storage.v1.buckets.delete",
      "args": {
        "bucket": "${bucket_name}"
      }
    }
  },
  {
    "the_end": {
      "return": "SUCCESS"
    }
  }
]

Module: googleapis.dataflow.v1b3.projects.jobs

Functions
aggregated List the jobs of a project across all regions.
create Creates a Cloud Dataflow job. To create a job, we recommend using projects.locations.jobs.create with a regional endpoint. Using projects.jobs.create is not recommended, as your job will always start in us-central1.
get Gets the state of the specified Cloud Dataflow job. To get the state of a job, we recommend using projects.locations.jobs.get with a regional endpoint. Using projects.jobs.get is not recommended, as you can only get the state of jobs that are running in us-central1.
getMetrics Request the job status. To request the status of a job, we recommend using projects.locations.jobs.getMetrics with a regional endpoint. Using projects.jobs.getMetrics is not recommended, as you can only request the status of jobs that are running in us-central1.
list List the jobs of a project. To list the jobs of a project in a region, we recommend using projects.locations.jobs.list with a regional endpoint. To list the all jobs across all regions, use projects.jobs.aggregated. Using projects.jobs.list is not recommended, as you can only get the list of jobs that are running in us-central1.
snapshot Snapshot the state of a streaming job.
update Updates the state of an existing Cloud Dataflow job. To update the state of an existing job, we recommend using projects.locations.jobs.update with a regional endpoint. Using projects.jobs.update is not recommended, as you can only update the state of jobs that are running in us-central1.

Module: googleapis.dataflow.v1b3.projects.jobs.messages

Functions
list Request the job status. To request the status of a job, we recommend using projects.locations.jobs.messages.list with a regional endpoint. Using projects.jobs.messages.list is not recommended, as you can only request the status of jobs that are running in us-central1.

Module: googleapis.dataflow.v1b3.projects.locations.flexTemplates

Functions
launch Launch a job with a FlexTemplate.

Module: googleapis.dataflow.v1b3.projects.locations.jobs

Functions
create Creates a Cloud Dataflow job. To create a job, we recommend using projects.locations.jobs.create with a regional endpoint. Using projects.jobs.create is not recommended, as your job will always start in us-central1.
get Gets the state of the specified Cloud Dataflow job. To get the state of a job, we recommend using projects.locations.jobs.get with a regional endpoint. Using projects.jobs.get is not recommended, as you can only get the state of jobs that are running in us-central1.
getExecutionDetails Request detailed information about the execution status of the job. EXPERIMENTAL. This API is subject to change or removal without notice.
getMetrics Request the job status. To request the status of a job, we recommend using projects.locations.jobs.getMetrics with a regional endpoint. Using projects.jobs.getMetrics is not recommended, as you can only request the status of jobs that are running in us-central1.
list List the jobs of a project. To list the jobs of a project in a region, we recommend using projects.locations.jobs.list with a regional endpoint. To list the all jobs across all regions, use projects.jobs.aggregated. Using projects.jobs.list is not recommended, as you can only get the list of jobs that are running in us-central1.
snapshot Snapshot the state of a streaming job.
update Updates the state of an existing Cloud Dataflow job. To update the state of an existing job, we recommend using projects.locations.jobs.update with a regional endpoint. Using projects.jobs.update is not recommended, as you can only update the state of jobs that are running in us-central1.

Module: googleapis.dataflow.v1b3.projects.locations.jobs.messages

Functions
list Request the job status. To request the status of a job, we recommend using projects.locations.jobs.messages.list with a regional endpoint. Using projects.jobs.messages.list is not recommended, as you can only request the status of jobs that are running in us-central1.

Module: googleapis.dataflow.v1b3.projects.locations.jobs.stages

Functions
getExecutionDetails Request detailed information about the execution status of a stage of the job. EXPERIMENTAL. This API is subject to change or removal without notice.

Module: googleapis.dataflow.v1b3.projects.locations.templates

Functions
create Creates a Cloud Dataflow job from a template.
get Get the template associated with a template.
launch Launch a template.

Module: googleapis.dataflow.v1b3.projects.templates

Functions
create Creates a Cloud Dataflow job from a template.
get Get the template associated with a template.
launch Launch a template.