Cloud Dataproc Permissions and IAM Roles

Overview

Google Cloud Identity and Access Management (IAM) allows you to control user and group access to your project's resources. This document focuses on the IAM permissions relevant to Cloud Dataproc and the IAM roles that grant those permissions.

Cloud Dataproc Permissions

Cloud Dataproc permissions allow users to perform specific actions on Cloud Dataproc clusters, jobs, and operations. For example, the dataproc.clusters.create permission allows a user to create Cloud Dataproc clusters in your project. You don't directly give users permissions; instead, you grant them roles, which have one or more permissions bundled within them.

The following tables list the permissions necessary to call Cloud Dataproc APIs (methods). The tables are organized according to the APIs associated with each Cloud Dataproc resource (clusters, jobs, and operations).

Clusters Permissions

Method Required Permission(s)
projects.regions.clusters.create 1, 2 dataproc.clusters.create
projects.regions.clusters.get dataproc.clusters.get
projects.regions.clusters.list dataproc.clusters.list
projects.regions.clusters.patch 1, 2 dataproc.clusters.update
projects.regions.clusters.delete 1 dataproc.clusters.delete
projects.regions.clusters.diagnose 1 dataproc.clusters.use

Notes:

  1. The dataproc.operations.get permission is also required to get status updates from gcloud command-line tool.
  2. The dataproc.clusters.get permission is also required to get the result of the operation from gcloud command-line tool.

Jobs Permissions

Method Required Permission(s)
projects.regions.jobs.submit 1, 2 dataproc.jobs.create
dataproc.clusters.use
projects.regions.jobs.get dataproc.jobs.get
projects.regions.jobs.list dataproc.jobs.list
projects.regions.jobs.cancel 1 dataproc.jobs.cancel
projects.regions.jobs.patch 1 dataproc.jobs.update
projects.regions.jobs.delete 1 dataproc.jobs.delete

Notes:

  1. The gcloud command-line tool additionally requires dataproc.jobs.get in order for the jobs submit, jobs wait, jobs update, jobs delete, and jobs kill commands to function properly.

  2. The gcloud command-line tool additionally requires dataproc.clusters.get permission to submit jobs. For an example of setting the permissions necessary for a user to run gcloud dataproc jobs submit on a specific cluster using Cloud Dataproc Granular IAM, see Submitting Jobs with Granular IAM.

Operations Permissions

Method Required Permission(s)
projects.regions.operations.get dataproc.operations.get
projects.regions.operations.list dataproc.operations.list
projects.regions.operations.cancel dataproc.operations.cancel
projects.regions.operations.delete dataproc.operations.delete

Workflow Template Permissions

Method Required Permission(s)
projects.regions.workflowTemplates.instantiate dataproc.workflowTemplates.instantiate
projects.regions.workflowTemplates.create dataproc.workflowTemplates.create
projects.regions.workflowTemplates.get dataproc.workflowTemplates.get
projects.regions.workflowTemplates.list dataproc.workflowTemplates.list
projects.regions.workflowTemplates.post dataproc.workflowTemplates.update
projects.regions.workflowTemplates.delete dataproc.workflowTemplates.delete

Notes:

  1. Workflow Template permissions are independent of Cluster and Job permissions. A user without create cluster or submit job permissions may create and instantiate a Workflow Template.

  2. The gcloud command-line tool additionally requires dataproc.operations.get permission to poll for workflow completion.

  3. The dataproc.operations.cancel permission is required to cancel a running workflow.

Cloud Dataproc Roles

Identity and Access Management (IAM) Cloud Dataproc roles are a bundle of one or more permissions. You grant roles to users or groups to allow them to perform actions on the Cloud Dataproc resources in your project. For example, the Dataproc Viewer role contains the dataproc.*.get and dataproc.*.list permissions, which allow a user to get and list Cloud Dataproc clusters, jobs, and operations in a project.

The following table lists the Cloud Dataproc IAM roles and the permissions associated with each role:

Cloud Dataproc Role Permissions
Dataproc/Dataproc Editor dataproc.*.create
dataproc.*.get
dataproc.*.list
dataproc.*.delete
dataproc.*.update
dataproc.clusters.use
dataproc.jobs.cancel
dataproc.workflowTemplates.instantiate
compute.machineTypes.get
compute.machineTypes.list
compute.networks.get
compute.networks.list
compute.projects.get
compute.regions.get
compute.regions.list
compute.zones.get
compute.zones.list
resourcemanager.projects.get
resourcemanager.projects.list
Dataproc/Dataproc Viewer dataproc.*.get
dataproc.*.list
compute.machineTypes.get
compute.regions.get
compute.regions.list
compute.zones.get
resourcemanager.projects.get
resourcemanager.projects.list
Dataproc/Dataproc Worker (for service accounts only) dataproc.agents.*
dataproc.tasks.*
logging.logEntries.create
monitoring.metricDescriptors.create
monitoring.metricDescriptors.get
monitoring.metricDescriptors.list
monitoring.monitoredResourceDescriptors.get
monitoring.monitoredResourceDescriptors.list
monitoring.timeSeries.create
storage.buckets.get
storage.objects.create
storage.objects.get
storage.objects.list
storage.objects.update
storage.objects.delete

Notes:

  • "*" signifies "clusters," "jobs," or "operations," except the only permissions associated with dataproc.operations. are get, list, and delete.
  • The compute permissions listed above are needed or recommended to create and view Cloud Dataproc clusters when using the Google Cloud Platform Console or the Cloud SDK gcloud command-line tool.
  • To allow a user to upload files, grant the Storage Object Creator role. To allow a user to view job output, grant the Storage Object Viewer role. Note that granting either of these Storage roles gives the user the ability to access any bucket in the project.
  • A user must have monitoring.timeSeries.list permission in order to view graphs on the Google Cloud Platform Console→Dataproc→Cluster details Overview tab.
  • A user must have compute.instances.list permission in order to view instance status and the master instance SSH menu on the Google Cloud Platform Console→Dataproc→Cluster details VM Instances tab. For information on Google Compute Engine roles, see Compute Engine→Available IAM roles).
  • To create a cluster with a user-specified service account, the specified service account must have all permissions granted by the Dataproc Worker role. Additional roles may be required depending on configured features. See Service Accounts for list of additional roles.

Project Roles

You can also set permissions at the project level by using the IAM Project roles. Here is a summary of the permissions associated with IAM Project roles:

Project Role Permissions
Project Viewer All project permissions for read-only actions that preserve state (get, list)
Project Editor All Project Viewer permissions plus all project permissions for actions that modify state (create, delete, update, use, cancel)
Project Owner All Project Editor permissions plus permissions to manage access control for the project (get/set IamPolicy) and to set up project billing

IAM Roles and Cloud Dataproc Operations Summary

The following table summarizes the Cloud Dataproc operations available based on the role granted to the user, with caveats noted.

Project Editor Project Viewer Cloud Dataproc Editor Cloud Dataproc Viewer
Create cluster Yes No Yes No
List clusters Yes Yes Yes Yes
Get cluster details Yes Yes Yes 1, 2 Yes 1, 2
Update cluster Yes No Yes No
Delete cluster Yes No Yes No
Submit job Yes No Yes 3 No
List jobs Yes Yes Yes Yes
Get job details Yes Yes Yes 4 Yes 4
Cancel job Yes No Yes No
Delete job Yes No Yes No
List operations Yes Yes Yes Yes
Get operation details Yes Yes Yes Yes
Delete operation Yes No Yes No

Notes:

  1. The performance graph is not available unless the user also has a role with the monitoring.timeSeries.list permission.
  2. The list of VMs in the cluster will not include status information or an SSH link for the master instance unless the user also has a role with the compute.instances.list permission.
  3. Jobs that include files to be uploaded cannot be submitted unless the user also has the Storage Object Creator role or has been granted write access to the staging bucket for the project.
  4. Job output is not available unless the user also has the Storage Object Viewer role or has been granted read access to the staging bucket for the project.

IAM management

You can get and set IAM policies using the Google Cloud Platform Console, the IAM API, or the gcloud command-line tool.

Cloud Dataproc Granular IAM

Cloud Dataproc Granular IAM is feature that allows you grant permissions at the cluster level.

Example: You can grant one user a Viewer role, which allows the user to view a cluster within a project, and grant another user an Editor role, which allows that user to use, update and delete, as well as view the cluster. See Cluster-Specific Actions with Granular IAM to understand the specific actions and commands enabled by each Cloud Dataproc Granular IAM role.

Cloud Dataproc Granular IAM Roles and Permissions

Cloud Dataproc Granular IAM can set the following roles with the following permissions on a cluster.

Role Permissions
Viewer dataproc.clusters.get
dataproc.clusters.list
Editor dataproc.clusters.get
dataproc.clusters.list
dataproc.clusters.delete
dataproc.clusters.update
dataproc.clusters.use
Owner dataproc.clusters.get
dataproc.clusters.list
dataproc.clusters.delete
dataproc.clusters.update
dataproc.clusters.use
dataproc.clusters.setIamPolicy
dataproc.clusters.getIamPolicy

Using Dataproc Granular IAM

This section explains how to use Cloud Dataproc Granular IAM to assign roles to users on an existing cluster. See Granting, Changing, and Revoking Access to Project Members for more general information on updating and removing Cloud IAM roles.

Console

Using Granular Cloud Dataproc IAM from the Google Cloud Platform Console will be supported in a future Cloud Dataproc release.

gcloud Command

  1. Get the cluster's IAM policy, and write it to a JSON file:
    gcloud beta dataproc clusters get-iam-policy cluster-name \
      --format=json > iam.json
    
  2. The contents of the JSON file will look similar to the following:
    {
      "bindings": [
        {
          "role": "roles/editor",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-other-app@appspot.gserviceaccount.com"
          ]
        }
      ],
      "etag": "string"
    }
    
    
  3. Using a text editor, add a new binding object to the bindings array that defines users and the cluster access role for those users. For example, to grant the Viewer role (roles/viewer) to the user sean@example.com, you would change the example above to add a new binding object (shown in bold, below. Note: make sure to return the etag value you received from gcloud beta dataproc clusters get-iam-policy (see the etag documentation).
    {
      "bindings": [
        {
          "role": "roles/editor",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-other-app@appspot.gserviceaccount.com"
          ]
        },
        {
          "role": "roles/viewer",
          "members": [
            "user:sean@example.com"
          ]
        }
      ],
      "etag": "value-from-get-iam-policy"
    }
    
    
  4. Update the cluster's policy with the new bindings array by running the following command:
    gcloud beta dataproc clusters set-iam-policy cluster-name \
      --format=json iam.json
    
  5. The command outputs the updated policy:
    {
      "bindings": [
        {
          "role": "roles/editor",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-other-app@appspot.gserviceaccount.com"
          ]
        },
        {
          "role": "roles/viewer",
          "members": [
            "user:sean@example.com"
          ]
        }
      ],
      "etag": "string"
    }
    

API

  1. Issue a clusters.getIamPolicy request to get the IAM policy for a cluster:

    HTTP Request

    GET https://dataproc.googleapis.com/v1beta2/projects/projectName/regions/region/clusters/clusterName:getIamPolicy
    
  2. The contents of the JSON file will look similar to the following:
    {
      "bindings": [
        {
          "role": "roles/editor",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-other-app@appspot.gserviceaccount.com"
          ]
        }
      ],
      "etag": "string"
    }
    
  3. Using a text editor, construct the following JSON policy object to enclose the bindings array you just received from the Cloud Dataproc service. Make sure to return the "etag" value you received in the getIamPolicy response (see the etag documentation). Now, add a new binding object to the bindings array that defines users and the cluster access role for those users. For example, to grant the Viewer role (roles/viewer) to the user sean@example.com, you would change the example above to add a new binding object (shown in bold, below):
    {
      "policy": {
        "version": "",
        "bindings": [
          {
            "role": "roles/editor",
            "members": [
              "user:mike@example.com",
              "group:admins@example.com",
              "domain:google.com",
              "serviceAccount:my-other-app@appspot.gserviceaccount.com"
            ]
          },
          {
            "role": "roles/viewer",
            "members": [
              "user:sean@example.com"
            ]
          }
        ],
        "etag": "value-from-getIamPolicy"
      }
    }
    
  4. Set the updated policy on the cluster by issuing a clusters.setIamPolicy request:

    HTTP Request

    POST https://dataproc.googleapis.com/v1beta2/projects/projectName/regions/region/clusters/clusterName:setIamPolicy
    

    Request body

    {
      "policy": {
        "version": "",
        "bindings": [
          {
            "role": "roles/editor",
            "members": [
              "user:mike@example.com",
              "group:admins@example.com",
              "domain:google.com",
              "serviceAccount:my-other-app@appspot.gserviceaccount.com"
            ]
          },
          {
            "role": "roles/viewer",
            "members": [
              "user:sean@example.com"
            ]
          }
        ],
        "etag": "value-from-getIamPolicy"
      }
    }
    
  5. The contents of the JSON response will look similar to the following:

    Response

    {
      "bindings": [
        {
          "role": "roles/editor",
          "members": [
            "user:mike@example.com",
            "group:admins@example.com",
            "domain:google.com",
            "serviceAccount:my-other-app@appspot.gserviceaccount.com"
          ]
        },
        {
          "role": "roles/viewer",
          "members": [
            "user:sean@example.com"
          ]
        }
      ],
      "etag": "string"
    }
    

Cluster Actions Enabled by Granular IAM

The table below shows the specific gcloud dataproc clusters commands enabled on a cluster by each Granular IAM role set on a cluster.

IAM Role Command
Viewer gcloud dataproc clusters describe cluster-name
Editor gcloud dataproc clusters describe cluster-name
gcloud dataproc clusters delete cluster-name
gcloud dataproc clusters diagnose cluster-name
gcloud dataproc clusters update cluster-name
Owner gcloud dataproc clusters describe cluster-name
gcloud dataproc clusters delete cluster-name
gcloud dataproc clusters diagnose cluster-name
gcloud dataproc clusters update cluster-name
gcloud beta dataproc clusters get-iam-policy cluster-name
gcloud beta dataproc clusters set-iam-policy cluster-name

Submitting Jobs with Granular IAM

To allow a member (user, group or service account) to submit jobs to a specified cluster using Cloud Dataproc Granular IAM, in addition to granting a user an Editor role on a cluster, additional permissions must be set at the project level. Here are the steps to take to allow a member to submit jobs on a specified Cloud Dataproc cluster:

  1. Create a Google Cloud Storage bucket that your cluster can use to connect to Cloud Storage.
  2. Add the member to the bucket-level policy, selecting the Storage Object Viewer role for the member (see roles/storage.objectViewer), which includes the following permissions:
    1. storage.objects.get
    2. storage.objects.list
  3. When you create the cluster, pass the name of the bucket you just created to your cluster using the --bucket parameter (see gcloud dataproc clusters create --bucket).
  4. After the cluster is created, set a policy on the cluster that grants the member an Editor or Owner role (see Using Dataproc Granular IAM).
  5. Create a Cloud IAM custom role with the following permissions:
    1. dataproc.jobs.create
    2. dataproc.jobs.get
  6. Select or Add the member on the Cloud Platform Console IAM page, then select the custom role to apply it to the member.

What's next

Send feedback about...

Google Cloud Dataproc Documentation