Cloud Dataflow Access Control Guide

Overview

You can use Cloud Dataflow IAM roles to limit access for users within a project or organization, to just Cloud Dataflow-related resources, as opposed to granting users viewer, editor, or owner access to the entire Cloud Platform project.

This page focuses on how to use Cloud Dataflow's IAM roles. For a detailed description of IAM and its features, see the Google Cloud Identity and Access Management developer's guide.

Every Cloud Dataflow method requires the caller to have the necessary permissions. For a list of the permissions and roles Cloud Dataflow supports, see the following section.

Permissions and Roles

This section summarizes the permissions and roles Cloud Dataflow IAM supports.

Required Permissions

The following table lists the permissions that the caller must have to call each method:

Method Required Permission(s)
dataflow.jobs.create dataflow.jobs.create
dataflow.jobs.cancel dataflow.jobs.cancel
dataflow.jobs.updateContents dataflow.jobs.updateContents
dataflow.jobs.list dataflow.jobs.list
dataflow.jobs.get dataflow.jobs.get
dataflow.jobs.drain dataflow.jobs.drain
dataflow.messages.list dataflow.messages.list
dataflow.metrics.get dataflow.metrics.get

Note: The Cloud Dataflow Worker role (roles/dataflow.worker) provides the permissions (dataflow.workItems.lease, dataflow.workItems.update, and dataflow.workItems.sendMessage) necessary for a Compute Engine service account to execute work units for a Cloud Dataflow pipeline. It should typically only be assigned to such an account, and only includes the ability to request and update work from the Cloud Dataflow service.

Roles

The following table lists the Cloud Dataflow IAM roles with a corresponding list of all the permissions each role includes. Note that every permission is applicable to a particular resource type.

Role includes permission(s) for resource types:
roles/dataflow.viewer dataflow.<resource-type>.list
dataflow.<resource-type>.get
jobs, messages, metrics
roles/dataflow.developer All of the above, as well as:
dataflow.jobs.create
dataflow.jobs.drain
dataflow.jobs.cancel

jobs
roles/dataflow.admin All of the above, as well as:
compute.machineTypes.get
storage.buckets.get
storage.objects.create
storage.objects.get
storage.objects.list
NA
roles/dataflow.worker (for controller service accounts only) dataflow.jobs.get
dataflow.jobs.list
dataflow.workItems.lease
dataflow.workItems.update
dataflow.workItems.sendMessage
storage.objects.create
storage.objects.get
NA

Creating Jobs

In order to a create a job, the following permissions are required, at minimum:

  • The dataflow.developer role, to instantiate the job itself.
  • The Viewer permission on the project, to access machine type information and view other settings.
  • The storage.objectAdmin role, to provide permission to stage files on Cloud Storage.

Alternatively, dataflow.admin can be granted alone for the same purpose, which includes the minimal set of permissions to run and examine jobs.

Example Role Assignment

To illustrate the utility of the different Cloud Dataflow roles, consider the following breakdown:

  • The developer interacting with the Cloud Dataflow job will need the dataflow.developer role.
    • They will need the storage.objectAdmin role in order to stage the required files.
    • For debugging and quota checking, they will need the project Viewer role.
    • Absent other role assignments, this will allow the developer to create and cancel Cloud Dataflow jobs, but not interact with the individual VMs or access other Cloud services.
  • The developer who only needs to create and examine jobs will need the dataflow.admin role.
  • The cloudservices account needs the iam.serviceAccountActor and compute.instanceAdmin.v1 roles on the project, in order to start VMs with the Compute Engine service account. Additionally, if native sources such as BigQuery are being used, it will need appropriate access to those systems.
  • The controller service account needs the dataflow.worker role to process data for the Cloud Dataflow service.

Assigning Cloud Dataflow roles

Cloud Dataflow roles can currently be set on organizations and projects only.

To manage roles at the organizational level, see Access Control for Organizations Using IAM.

To set project-level roles, see Access control via the Google Cloud Platform Console.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow Documentation