Access control with IAM

To limit access for users within a project or organization, you can use Identity and Access Management (IAM) roles for Dataflow. You can control access to Dataflow-related resources, as opposed to granting users the Viewer, Editor, or Owner role to the entire Google Cloud project.

This page focuses on how to use Dataflow's IAM roles. For a detailed description of IAM and its features, see the IAM documentation.

Every Dataflow method requires the caller to have the necessary permissions. For a list of the permissions and roles Dataflow supports, see the following section.

Permissions and roles

This section summarizes the permissions and roles Dataflow IAM supports.

Required permissions

The following table lists the permissions that the caller must have to call each method:

Method	Required Permissions
`dataflow.jobs.create`	`dataflow.jobs.create`
`dataflow.jobs.cancel`	`dataflow.jobs.cancel`
`dataflow.jobs.updateContents`	`dataflow.jobs.updateContents`
`dataflow.jobs.list`	`dataflow.jobs.list`
`dataflow.jobs.get`	`dataflow.jobs.get`
`dataflow.messages.list`	`dataflow.messages.list`
`dataflow.metrics.get`	`dataflow.metrics.get`
`dataflow.jobs.snapshot`	`dataflow.jobs.snapshot`

Roles

The following table lists the Dataflow IAM roles with a corresponding list of Dataflow-related permissions each role includes. Every permission is applicable to a particular resource type. For a list of permissions, see the Roles page in the Google Cloud console.

Role Permissions

Role	Permissions
Dataflow Admin (`roles/dataflow.admin`) Minimal role for creating and managing dataflow jobs.	`cloudbuild.builds.create` `cloudbuild.builds.get` `cloudbuild.builds.list` `cloudbuild.builds.update` `cloudbuild.operations.` `cloudbuild.operations.get` `cloudbuild.operations.list` `compute.machineTypes.get` `compute.projects.get` `compute.regions.list` `compute.zones.list` `dataflow.jobs.` `dataflow.jobs.cancel` `dataflow.jobs.create` `dataflow.jobs.get` `dataflow.jobs.list` `dataflow.jobs.snapshot` `dataflow.jobs.updateContents` `dataflow.messages.list` `dataflow.metrics.get` `dataflow.snapshots.` `dataflow.snapshots.delete` `dataflow.snapshots.get` `dataflow.snapshots.list` `recommender.dataflowDiagnosticsInsights.` `recommender.dataflowDiagnosticsInsights.get` `recommender.dataflowDiagnosticsInsights.list` `recommender.dataflowDiagnosticsInsights.update` `remotebuildexecution.blobs.get` `resourcemanager.projects.get` `resourcemanager.projects.list` `storage.buckets.get` `storage.objects.create` `storage.objects.get` `storage.objects.list`
Dataflow Developer (`roles/dataflow.developer`) Provides the permissions necessary to execute and manipulate Dataflow jobs. Lowest-level resources where you can grant this role: Project	`cloudbuild.builds.create` `cloudbuild.builds.get` `cloudbuild.builds.list` `cloudbuild.builds.update` `cloudbuild.operations.` `cloudbuild.operations.get` `cloudbuild.operations.list` `compute.projects.get` `compute.regions.list` `compute.zones.list` `dataflow.jobs.` `dataflow.jobs.cancel` `dataflow.jobs.create` `dataflow.jobs.get` `dataflow.jobs.list` `dataflow.jobs.snapshot` `dataflow.jobs.updateContents` `dataflow.messages.list` `dataflow.metrics.get` `dataflow.snapshots.` `dataflow.snapshots.delete` `dataflow.snapshots.get` `dataflow.snapshots.list` `recommender.dataflowDiagnosticsInsights.` `recommender.dataflowDiagnosticsInsights.get` `recommender.dataflowDiagnosticsInsights.list` `recommender.dataflowDiagnosticsInsights.update` `remotebuildexecution.blobs.get` `resourcemanager.projects.get` `resourcemanager.projects.list`
Dataflow Viewer (`roles/dataflow.viewer`) Provides read-only access to all Dataflow-related resources. Lowest-level resources where you can grant this role: Project	`dataflow.jobs.get` `dataflow.jobs.list` `dataflow.messages.list` `dataflow.metrics.get` `dataflow.snapshots.get` `dataflow.snapshots.list` `recommender.dataflowDiagnosticsInsights.get` `recommender.dataflowDiagnosticsInsights.list` `resourcemanager.projects.get` `resourcemanager.projects.list`
Dataflow Worker (`roles/dataflow.worker`) Provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline. Lowest-level resources where you can grant this role: Project	`autoscaling.sites.readRecommendations` `autoscaling.sites.writeMetrics` `autoscaling.sites.writeState` `compute.instanceGroupManagers.update` `compute.instances.delete` `compute.instances.setDiskAutoDelete` `dataflow.jobs.get` `dataflow.shuffle.` `dataflow.shuffle.read` `dataflow.shuffle.write` `dataflow.streamingWorkItems.` `dataflow.streamingWorkItems.ImportState` `dataflow.streamingWorkItems.commitWork` `dataflow.streamingWorkItems.getData` `dataflow.streamingWorkItems.getWork` `dataflow.streamingWorkItems.getWorkerMetadata` `dataflow.workItems.*` `dataflow.workItems.lease` `dataflow.workItems.sendMessage` `dataflow.workItems.update` `logging.logEntries.create` `logging.logEntries.route` `monitoring.timeSeries.create` `storage.buckets.get` `storage.objects.create` `storage.objects.get`

Dataflow Admin

(roles/dataflow.admin)

Minimal role for creating and managing dataflow jobs.

cloudbuild.builds.create

cloudbuild.builds.get

cloudbuild.builds.list

cloudbuild.builds.update

cloudbuild.operations.*

cloudbuild.operations.get
cloudbuild.operations.list

compute.machineTypes.get

compute.projects.get

compute.regions.list

compute.zones.list

dataflow.jobs.*

dataflow.jobs.cancel
dataflow.jobs.create
dataflow.jobs.get
dataflow.jobs.list
dataflow.jobs.snapshot
dataflow.jobs.updateContents

dataflow.messages.list

dataflow.metrics.get

dataflow.snapshots.*

dataflow.snapshots.delete
dataflow.snapshots.get
dataflow.snapshots.list

recommender.dataflowDiagnosticsInsights.*

recommender.dataflowDiagnosticsInsights.get
recommender.dataflowDiagnosticsInsights.list
recommender.dataflowDiagnosticsInsights.update

remotebuildexecution.blobs.get

resourcemanager.projects.get

resourcemanager.projects.list

storage.buckets.get

storage.objects.create

storage.objects.get

storage.objects.list

Dataflow Developer

(roles/dataflow.developer)

Provides the permissions necessary to execute and manipulate Dataflow jobs.

Lowest-level resources where you can grant this role:

Project

cloudbuild.builds.create

cloudbuild.builds.get

cloudbuild.builds.list

cloudbuild.builds.update

cloudbuild.operations.*

cloudbuild.operations.get
cloudbuild.operations.list

compute.projects.get

compute.regions.list

compute.zones.list

dataflow.jobs.*

dataflow.jobs.cancel
dataflow.jobs.create
dataflow.jobs.get
dataflow.jobs.list
dataflow.jobs.snapshot
dataflow.jobs.updateContents

dataflow.messages.list

dataflow.metrics.get

dataflow.snapshots.*

dataflow.snapshots.delete
dataflow.snapshots.get
dataflow.snapshots.list

recommender.dataflowDiagnosticsInsights.*

recommender.dataflowDiagnosticsInsights.get
recommender.dataflowDiagnosticsInsights.list
recommender.dataflowDiagnosticsInsights.update

remotebuildexecution.blobs.get

resourcemanager.projects.get

resourcemanager.projects.list

Dataflow Viewer

(roles/dataflow.viewer)

Provides read-only access to all Dataflow-related resources.

Lowest-level resources where you can grant this role:

Project

dataflow.jobs.get

dataflow.jobs.list

dataflow.messages.list

dataflow.metrics.get

dataflow.snapshots.get

dataflow.snapshots.list

recommender.dataflowDiagnosticsInsights.get

recommender.dataflowDiagnosticsInsights.list

resourcemanager.projects.get

resourcemanager.projects.list

Dataflow Worker

(roles/dataflow.worker)

Provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline.

Lowest-level resources where you can grant this role:

Project

autoscaling.sites.readRecommendations

autoscaling.sites.writeMetrics

autoscaling.sites.writeState

compute.instanceGroupManagers.update

compute.instances.delete

compute.instances.setDiskAutoDelete

dataflow.jobs.get

dataflow.shuffle.*

dataflow.shuffle.read
dataflow.shuffle.write

dataflow.streamingWorkItems.*

dataflow.streamingWorkItems.ImportState
dataflow.streamingWorkItems.commitWork
dataflow.streamingWorkItems.getData
dataflow.streamingWorkItems.getWork
dataflow.streamingWorkItems.getWorkerMetadata

dataflow.workItems.*

dataflow.workItems.lease
dataflow.workItems.sendMessage
dataflow.workItems.update

logging.logEntries.create

logging.logEntries.route

monitoring.timeSeries.create

storage.buckets.get

storage.objects.create

storage.objects.get

The Dataflow Worker role (roles/dataflow.worker) provides the permissions necessary for a Compute Engine service account to run work units for an Apache Beam pipeline. The Dataflow Worker role must be assigned to a service account that is able to request and update work from the Dataflow service.

The Dataflow Service Agent role (roles/dataflow.serviceAgent) is used exclusively by the Dataflow service account. It provides the service account access to managed resources in your Google Cloud project to run Dataflow jobs. It is assigned automatically to the service account when you enable the Dataflow API for your project from the APIs page in the Google Cloud console.

Creating jobs

To a create a job, the roles/dataflow.admin role includes the minimal set of permissions required to run and examine jobs.

Alternatively, the following permissions are required:

The roles/dataflow.developer role, to instantiate the job itself.
The roles/compute.viewer role, to access machine type information and view other settings.
The roles/storage.objectAdmin role, to provide permission to stage files on Cloud Storage.

Example role assignment

To illustrate the utility of the different Dataflow roles, consider the following breakdown:

The developer who creates and examines jobs needs the roles/iam.serviceAccountUser role.
For more sophisticated permissions management, the developer interacting with the Dataflow job needs the roles/dataflow.developer role.
- They need the roles/storage.objectAdmin or a related role to stage the required files.
- For debugging and quota checking, they need the project roles/compute.viewer role.
- Absent other role assignments, this role lets the developer create and cancel Dataflow jobs, but not interact with the individual VMs or access other Cloud services.
The worker service account needs the roles/dataflow.worker and the roles/dataflow.admin roles to process data for the Dataflow service.
- To access job data, the worker service account needs other roles such as roles/storage.objectAdmin.
- To write to BigQuery tables, the worker service account needs the roles/bigquery.dataEditor role.
- To read from a Pub/Sub topic or subscription, the worker service account needs the roles/pubsub.editor role.
If you're using a Shared VPC, the Shared VPC subnetwork needs to be shared with the Dataflow service account and needs to have the Compute Network User role assigned on the specified subnet.
- To see if the Shared VPC subnetwork is shared with the Dataflow service account, in the Google Cloud console, go to the Shared VPC page and search for the subnet. In the Shared with column, you can see whether the VPC subnetwork is shared with the Dataflow service account. For more information, see Guidelines for specifying a subnetwork parameter for Shared VPC.
- The host project's Compute Engine service account, the service project's Dataflow worker service account, and the service account used to submit the job need to have the following roles:
  - roles/dataflow.admin
  - roles/dataflow.serviceAgent
  - roles/compute.networkUser
  - roles/storage.objectViewer

Assigning Dataflow roles

Dataflow roles can currently be set on organizations and projects only.

To manage roles at the organizational level, see Access control for organizations using IAM.

To set project-level roles, see Granting, changing, and revoking access to resources.