To limit access for users within a project or organization, you can use Identity and Access Management (IAM) roles for Dataflow. You can control access to Dataflow-related resources, as opposed to granting users the Viewer, Editor, or Owner role to the entire Google Cloud project.
This page focuses on how to use Dataflow's IAM roles. For a detailed description of IAM and its features, see the IAM documentation.
Every Dataflow method requires the caller to have the necessary permissions. For a list of the permissions and roles Dataflow supports, see the following section.
Permissions and roles
This section summarizes the permissions and roles Dataflow IAM supports.
Required permissions
The following table lists the permissions that the caller must have to call each method:
Method | Required Permissions |
---|---|
dataflow.jobs.create |
dataflow.jobs.create |
dataflow.jobs.cancel |
dataflow.jobs.cancel |
dataflow.jobs.updateContents |
dataflow.jobs.updateContents |
dataflow.jobs.list |
dataflow.jobs.list |
dataflow.jobs.get |
dataflow.jobs.get |
dataflow.messages.list |
dataflow.messages.list |
dataflow.metrics.get |
dataflow.metrics.get |
dataflow.jobs.snapshot |
dataflow.jobs.snapshot |
Roles
The following table lists the Dataflow IAM roles with a corresponding list of Dataflow-related permissions each role includes. Every permission is applicable to a particular resource type. For a list of permissions, see the Roles page in the Google Cloud console.
Role | Permissions |
---|---|
Dataflow Admin
Minimal role for creating and managing dataflow jobs. |
|
Dataflow Developer
Provides the permissions necessary to execute and manipulate Dataflow jobs. Lowest-level resources where you can grant this role:
|
|
Dataflow Viewer
Provides read-only access to all Dataflow-related resources. Lowest-level resources where you can grant this role:
|
|
Dataflow Worker
Provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline. Lowest-level resources where you can grant this role:
|
|
The Dataflow Worker role (roles/dataflow.worker
)
provides the permissions necessary for a Compute Engine service account to run work units
for an Apache Beam pipeline. The Dataflow Worker role
must be assigned to a service account that is able to request
and update work from the Dataflow service.
The Dataflow Service Agent role (roles/dataflow.serviceAgent
)
is used exclusively by the Dataflow service account. It provides the service account access to managed
resources in your Google Cloud project to run Dataflow
jobs. It is assigned automatically to the service account when you enable the
Dataflow API for your project from the
APIs page in the Google Cloud console.
Creating jobs
To a create a job, the roles/dataflow.admin
role includes
the minimal set of permissions required to run and examine jobs.
Alternatively, the following permissions are required:
- The
roles/dataflow.developer
role, to instantiate the job itself. - The
roles/compute.viewer
role, to access machine type information and view other settings. - The
roles/storage.objectAdmin
role, to provide permission to stage files on Cloud Storage.
Example role assignment
To illustrate the utility of the different Dataflow roles, consider the following breakdown:
- The developer who creates and examines jobs needs the
roles/dataflow.admin
role. - For more sophisticated permissions management, the developer interacting with the Dataflow job needs the
roles/dataflow.developer
role.- They need the
roles/storage.objectAdmin
or a related role to stage the required files. - For debugging and quota checking, they need the project
roles/compute.viewer
role. - Absent other role assignments, this role lets the developer create and cancel Dataflow jobs, but not interact with the individual VMs or access other Cloud services.
- They need the
- The worker service account needs the
roles/dataflow.worker
role to process data for the Dataflow service. To access job data, the service account needs other roles such asroles/storage.objectAdmin
.
Assigning Dataflow roles
Dataflow roles can currently be set on organizations and projects only.
To manage roles at the organizational level, see Access control for organizations using IAM.
To set project-level roles, see Granting, changing, and revoking access to resources.