Dataproc IAM roles and permissions

Overview

Identity and Access Management (IAM) lets you control user and group access to project resources. This document focuses on the IAM permissions relevant to Dataproc and the IAM roles that grant those permissions.

Dataproc permissions

Dataproc permissions allow users, including service accounts, to perform actions on Dataproc clusters, jobs, operations, and workflow templates. For example, the dataproc.clusters.create permission allows a user to create Dataproc clusters in a project. Typically, you don't grant permissions; instead, you grant roles, which include one or more permissions.

The following tables list the permissions necessary to call Dataproc APIs (methods). The tables are organized according to the APIs associated with each Dataproc resource (clusters, jobs, operations, and workflow templates).

Permission Scope: The scope of Dataproc permissions listed in the following tables is the containing Google Cloud project (cloud-platform scope). See Service account permissions.

Examples:

  • dataproc.clusters.create permits the creation of Dataproc clusters in the containing project
  • dataproc.jobs.create permits the submission of Dataproc jobs to Dataproc clusters in the containing project
  • dataproc.clusters.list permits the listing of details of Dataproc clusters in the containing project

Clusters methods required permissions

Method Required permissions
projects.regions.clusters.create 1, 2 dataproc.clusters.create
projects.regions.clusters.get dataproc.clusters.get
projects.regions.clusters.list dataproc.clusters.list
projects.regions.clusters.patch 1, 2, 3 dataproc.clusters.update
projects.regions.clusters.delete 1 dataproc.clusters.delete
projects.regions.clusters.start dataproc.clusters.start
projects.regions.clusters.stop dataproc.clusters.stop
projects.regions.clusters.getIamPolicy dataproc.clusters.getIamPolicy
projects.regions.clusters.setIamPolicy dataproc.clusters.setIamPolicy

Notes:

  1. The dataproc.operations.get permission is also required to get status updates from Google Cloud CLI.
  2. The dataproc.clusters.get permission is also required to get the result of the operation from Google Cloud CLI.
  3. dataproc.autoscalingPolicies.use permission is also required to enable an autoscaling policy on a cluster.

Jobs methods required permissions

Method Required permissions
projects.regions.jobs.submit 1, 2 dataproc.jobs.create
dataproc.clusters.use
projects.regions.jobs.get dataproc.jobs.get
projects.regions.jobs.list dataproc.jobs.list
projects.regions.jobs.cancel 1 dataproc.jobs.cancel
projects.regions.jobs.patch 1 dataproc.jobs.update
projects.regions.jobs.delete 1 dataproc.jobs.delete
projects.regions.jobs.getIamPolicy dataproc.jobs.getIamPolicy
projects.regions.jobs.setIamPolicy dataproc.jobs.setIamPolicy

Notes:

  1. The Google Cloud CLI also requires dataproc.jobs.get permission for the jobs submit, jobs wait, jobs update, jobs delete, and jobs kill commands.

  2. The gcloud CLI also requires dataproc.clusters.get permission to submit jobs. For an example of setting the permissions necessary for a user to run gcloud dataproc jobs submit on a cluster using Dataproc Granular IAM (see Submitting Jobs with Granular IAM).

Operations methods required permissions

Method Required permissions
projects.regions.operations.get dataproc.operations.get
projects.regions.operations.list dataproc.operations.list
projects.regions.operations.cancel dataproc.operations.cancel
projects.regions.operations.delete dataproc.operations.delete
projects.regions.operations.getIamPolicy dataproc.operations.getIamPolicy
projects.regions.operations.setIamPolicy dataproc.operations.setIamPolicy

Workflow templates methods required permissions

Method Required permissions
projects.regions.workflowTemplates.instantiate dataproc.workflowTemplates.instantiate
projects.regions.workflowTemplates.instantiateInline dataproc.workflowTemplates.instantiateInline
projects.regions.workflowTemplates.create dataproc.workflowTemplates.create
projects.regions.workflowTemplates.get dataproc.workflowTemplates.get
projects.regions.workflowTemplates.list dataproc.workflowTemplates.list
projects.regions.workflowTemplates.update dataproc.workflowTemplates.update
projects.regions.workflowTemplates.delete dataproc.workflowTemplates.delete
projects.regions.workflowTemplates.getIamPolicy dataproc.workflowTemplates.getIamPolicy
projects.regions.workflowTemplates.setIamPolicy dataproc.workflowTemplates.setIamPolicy

Notes:

  1. Workflow Template permissions are independent of Cluster and Job permissions. A user without create cluster or submit job permissions may create and instantiate a Workflow Template.

  2. The Google Cloud CLI additionally requires dataproc.operations.get permission to poll for workflow completion.

  3. The dataproc.operations.cancel permission is required to cancel a running workflow.

Autoscaling policies methods required permissions

Method Required permissions
projects.regions.autoscalingPolicies.create dataproc.autoscalingPolicies.create
projects.regions.autoscalingPolicies.get dataproc.autoscalingPolicies.get
projects.regions.autoscalingPolicies.list dataproc.autoscalingPolicies.list
projects.regions.autoscalingPolicies.update dataproc.autoscalingPolicies.update
projects.regions.autoscalingPolicies.delete dataproc.autoscalingPolicies.delete
projects.regions.autoscalingPolicies.getIamPolicy dataproc.autoscalingPolicies.getIamPolicy
projects.regions.autoscalingPolicies.setIamPolicy dataproc.autoscalingPolicies.setIamPolicy

Notes:

  1. dataproc.autoscalingPolicies.use permission is required to enable an autoscaling policy on a cluster with a clusters.patch method request.

Node groups methods required permissions

Method Required permissions
projects.regions.nodeGroups.create dataproc.nodeGroups.create
projects.regions.nodeGroups.get dataproc.nodeGroups.get
projects.regions.nodeGroups.resize dataproc.nodeGroups.update

Dataproc roles

Dataproc IAM roles are a bundle of one or more permissions. You grant roles to users or groups to allow them to perform actions on the Dataproc resources in a project. For example, the Dataproc Viewer role contain get and list permissions, which allow a user to get and list Dataproc clusters, jobs, and operations in a project.

The following table lists the Dataproc IAM roles and the permissions associated with each role.

Permissions

(roles/dataproc.admin)

Full control of Dataproc resources.

compute.machineTypes.*

compute.networks.get

compute.networks.list

compute.projects.get

compute.regions.*

compute.zones.*

dataproc.autoscalingPolicies.*

dataproc.batches.analyze

dataproc.batches.cancel

dataproc.batches.create

dataproc.batches.delete

dataproc.batches.get

dataproc.batches.list

dataproc.batches.sparkApplicationRead

dataproc.clusters.*

dataproc.jobs.*

dataproc.nodeGroups.*

dataproc.operations.*

dataproc.sessionTemplates.*

dataproc.sessions.create

dataproc.sessions.delete

dataproc.sessions.get

dataproc.sessions.list

dataproc.sessions.sparkApplicationRead

dataproc.sessions.terminate

dataproc.workflowTemplates.*

dataprocrm.nodePools.*

dataprocrm.nodes.get

dataprocrm.nodes.heartbeat

dataprocrm.nodes.list

dataprocrm.nodes.update

dataprocrm.operations.get

dataprocrm.operations.list

dataprocrm.workloads.*

resourcemanager.projects.get

resourcemanager.projects.list

(roles/dataproc.editor)

Provides the permissions necessary for viewing the resources required to manage Dataproc, including machine types, networks, projects, and zones.

Lowest-level resources where you can grant this role:

  • Cluster

compute.machineTypes.*

compute.networks.get

compute.networks.list

compute.projects.get

compute.regions.*

compute.zones.*

dataproc.autoscalingPolicies.create

dataproc.autoscalingPolicies.delete

dataproc.autoscalingPolicies.get

dataproc.autoscalingPolicies.list

dataproc.autoscalingPolicies.update

dataproc.autoscalingPolicies.use

dataproc.batches.analyze

dataproc.batches.cancel

dataproc.batches.create

dataproc.batches.delete

dataproc.batches.get

dataproc.batches.list

dataproc.batches.sparkApplicationRead

dataproc.clusters.create

dataproc.clusters.delete

dataproc.clusters.get

dataproc.clusters.list

dataproc.clusters.start

dataproc.clusters.stop

dataproc.clusters.update

dataproc.clusters.use

dataproc.jobs.cancel

dataproc.jobs.create

dataproc.jobs.delete

dataproc.jobs.get

dataproc.jobs.list

dataproc.jobs.update

dataproc.nodeGroups.*

dataproc.operations.cancel

dataproc.operations.delete

dataproc.operations.get

dataproc.operations.list

dataproc.sessionTemplates.*

dataproc.sessions.create

dataproc.sessions.delete

dataproc.sessions.get

dataproc.sessions.list

dataproc.sessions.sparkApplicationRead

dataproc.sessions.terminate

dataproc.workflowTemplates.create

dataproc.workflowTemplates.delete

dataproc.workflowTemplates.get

dataproc.workflowTemplates.instantiate

dataproc.workflowTemplates.instantiateInline

dataproc.workflowTemplates.list

dataproc.workflowTemplates.update

dataprocrm.nodePools.*

dataprocrm.nodes.get

dataprocrm.nodes.heartbeat

dataprocrm.nodes.list

dataprocrm.nodes.update

dataprocrm.operations.get

dataprocrm.operations.list

dataprocrm.workloads.*

resourcemanager.projects.get

resourcemanager.projects.list

(roles/dataproc.hubAgent)

Allows management of Dataproc resources. Intended for service accounts running Dataproc Hub instances.

compute.instances.get

compute.instances.setMetadata

compute.instances.setTags

compute.zoneOperations.get

compute.zones.list

dataproc.autoscalingPolicies.get

dataproc.autoscalingPolicies.list

dataproc.autoscalingPolicies.use

dataproc.clusters.create

dataproc.clusters.delete

dataproc.clusters.get

dataproc.clusters.list

dataproc.clusters.update

dataproc.operations.cancel

dataproc.operations.delete

dataproc.operations.get

dataproc.operations.list

iam.serviceAccounts.actAs

iam.serviceAccounts.get

iam.serviceAccounts.list

logging.buckets.get

logging.buckets.list

logging.exclusions.get

logging.exclusions.list

logging.links.get

logging.links.list

logging.locations.*

logging.logEntries.create

logging.logEntries.list

logging.logEntries.route

logging.logMetrics.get

logging.logMetrics.list

logging.logServiceIndexes.list

logging.logServices.list

logging.logs.list

logging.operations.get

logging.operations.list

logging.queries.getShared

logging.queries.listShared

logging.queries.usePrivate

logging.sinks.get

logging.sinks.list

logging.usage.get

logging.views.get

logging.views.list

observability.scopes.get

resourcemanager.projects.get

resourcemanager.projects.list

storage.buckets.get

storage.objects.get

storage.objects.list

(roles/dataproc.viewer)

Provides read-only access to Dataproc resources.

Lowest-level resources where you can grant this role:

  • Cluster

compute.machineTypes.get

compute.regions.*

compute.zones.*

dataproc.autoscalingPolicies.get

dataproc.autoscalingPolicies.list

dataproc.batches.analyze

dataproc.batches.get

dataproc.batches.list

dataproc.batches.sparkApplicationRead

dataproc.clusters.get

dataproc.clusters.list

dataproc.jobs.get

dataproc.jobs.list

dataproc.nodeGroups.get

dataproc.operations.get

dataproc.operations.list

dataproc.sessionTemplates.get

dataproc.sessionTemplates.list

dataproc.sessions.get

dataproc.sessions.list

dataproc.sessions.sparkApplicationRead

dataproc.workflowTemplates.get

dataproc.workflowTemplates.list

resourcemanager.projects.get

resourcemanager.projects.list

(roles/dataproc.worker)

Provides worker access to Dataproc resources. Intended for service accounts.

cloudprofiler.profiles.create

cloudprofiler.profiles.update

dataproc.agents.*

dataproc.batches.sparkApplicationWrite

dataproc.sessions.sparkApplicationWrite

dataproc.tasks.*

dataprocrm.nodes.mintOAuthToken

logging.logEntries.create

logging.logEntries.route

monitoring.metricDescriptors.create

monitoring.metricDescriptors.get

monitoring.metricDescriptors.list

monitoring.monitoredResourceDescriptors.*

monitoring.timeSeries.create

storage.buckets.get

storage.folders.*

storage.managedFolders.create

storage.managedFolders.delete

storage.managedFolders.get

storage.managedFolders.list

storage.multipartUploads.*

storage.objects.*

Notes:

  • compute permissions are needed or recommended to create and view Dataproc clusters when using the Google Cloud console or the gcloud CLI Google Cloud CLI.
  • To allow a user to upload files, grant the Storage Object Creator role. To allow a user to view job output, grant the Storage Object Viewer role.
  • A user must have monitoring.timeSeries.list permission in order to view graphs on the Google Cloud console→Dataproc→Cluster details Overview tab.
  • A user must have compute.instances.list permission in order to view instance status and the master instance SSH menu on the Google Cloud console→Dataproc→Cluster details VM Instances tab. For information on Compute Engine roles, see Compute Engine→Available IAM roles).
  • To create a cluster with a user-specified service account, the specified service account must have all permissions granted by the Dataproc Worker role, which include access to the Dataproc staging and temp buckets. Additional roles may be required depending on configured features. See Create a cluster with a custom VM service account for more information.

Project roles

You can also set permissions at the project level by using the IAM Project roles. The following table lists permissions associated with IAM Project roles:

Project Role Permissions
Project Viewer All project permissions for read-only actions that preserve state (get, list)
Project Editor All Project Viewer permissions plus all project permissions for actions that modify state (create, delete, update, use, cancel, stop, start)
Project Owner All Project Editor permissions plus permissions to manage access control for the project (get/set IamPolicy) and to set up project billing

IAM roles and Dataproc operations summary

The following table lists Dataproc operations associated with project and Dataproc roles.

Operation Project Editor Project Viewer Dataproc Admin Dataproc Editor Dataproc Viewer
Get/Set Dataproc IAM permissions No No Yes No No
Create cluster Yes No Yes Yes No
List clusters Yes Yes Yes Yes Yes
Get cluster details Yes Yes Yes 1, 2 Yes 1, 2 Yes 1, 2
Update cluster Yes No Yes Yes No
Delete cluster Yes No Yes Yes No
Start/Stop cluster Yes No Yes Yes No
Submit job Yes No Yes 3 Yes 3 No
List jobs Yes Yes Yes Yes Yes
Get job details Yes Yes Yes 4 Yes 4 Yes 4
Cancel job Yes No Yes Yes No
Delete job Yes No Yes Yes No
List operations Yes Yes Yes Yes Yes
Get operation details Yes Yes Yes Yes Yes
Delete operation Yes No Yes Yes No

Notes:

  1. The performance graph is not available unless the user also has a role with the monitoring.timeSeries.list permission.
  2. The list of VMs in the cluster will not include status information or an SSH link for the master instance unless the user also has a role with the compute.instances.list permission.
  3. Jobs that upload files require the user to have the Storage Object Creator role or write access to the Dataproc staging bucket.
  4. Job output is not available unless the user also has the Storage Object Viewer role or has been granted read access to the staging bucket for the project.

Service accounts

When you call Dataproc APIs to perform actions in a project, such as creating VM instances, Dataproc performs the actions on your behalf by using a service account that has the permissions required to perform the actions. For more information, see Dataproc service accounts.

IAM management

You can get and set IAM policies using the Google Cloud console, the IAM API, or the Google Cloud CLI.

What's next