Cloud Dataproc principals and roles

When you use the Cloud Dataproc service to create clusters and run jobs on your clusters, the service sets ups the necessary Cloud Dataproc Permissions and IAM Roles in your project to access and use the Google Cloud Platform resources it needs to accomplish these tasks. However, if you do cross-project work, for example to access data in another project, you will need to set up the necessary roles and permissions to access cross-project resources.

To help you do cross-project work successfully, this document lists the different principals that use the Cloud Dataproc service and the roles and associated permissions necessary for those principals to access and use GCP resources.

Cloud Dataproc API User (End User identity)

Example: username@example.com

This is the end user that calls the Cloud Dataproc service. The end user is usually an individual, but it can also be a service account if Cloud Dataproc is invoked through an API client or from another Google Cloud Platform service such as Compute Engine, Cloud Functions, or Cloud Composer.

Related roles and permissions:

Cloud Dataproc Service Agent (Control Plane identity)

Example: service-project-number@dataproc-accounts.iam.gserviceaccount.com

Cloud Dataproc creates this service account with the Dataproc Service Agent role in a Cloud Dataproc user's GCP project. This service account cannot be replaced by a user-specified service account when you create a cluster. You do not need to configure this service account unless you are creating a cluster that uses a shared VPC network in another project.

This service account is used to perform a broad set of system operations, including:

  • get and list operations to confirm the configuration of resources such as images, firewalls, Cloud Dataproc initialization actions, and Cloud Storage buckets
  • Auto-creation of the Cloud Dataproc staging bucket if the staging bucket is not specified by the user
  • Writing cluster configuration metadata to the staging bucket
  • Creation of Compute Engine resources, including VM instances, instance groups, and instance templates

Related error: "The service account does not have read or list access to the resource."

Related roles and permissions:

  • Role: Dataproc Service Agent

Cloud Dataproc VM Service Account (Data Plane identity)

Example: project-number-compute@developer.gserviceaccount.com

Cloud Dataproc VMs run as this service account. User jobs are granted the permissions of this service account. If you do not specify a user-managed service account when creating a cluster, the default Compute Engine service account (as shown in the above example) will be used.

The VM service account must have permissions to:

  • read and write to the Cloud Dataproc staging bucket

The VM service account may also need permissions, according to job requirements, to:

  • read and write to Cloud Storage, BigQuery, Stackdriver Logging, and to other Google Cloud Platform resources

Related roles and permissions:

For more information

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation
Need help? Visit our support page.