Dataproc service accounts

This page describes service accounts and VM access scopes and how they are used with Dataproc.

What are service accounts?

A service account is a special account that can be used by services and applications running on a Compute Engine virtual machine (VM) instance to interact with other Google Cloud APIs. Applications can use service account credentials to authorize themselves to a set of APIs and perform actions on the VM within the permissions granted to the service account.

Dataproc service accounts

The following service accounts are granted permissions required to perform Dataproc actions in the project where your cluster is located.

  • Dataproc Service Agent service account: Dataproc creates this service account with the Dataproc Service Agent role in a Dataproc user's Google Cloud project. This service account cannot be replaced by a user-specified service account when you create a cluster. This service agent account is used to perform Dataproc control plane operations, such as the creation, update, and deletion of cluster VMs (see Dataproc Service Agent (Control Plane identity)).

    By default, Dataproc uses the service-[project-number]@dataproc-accounts.iam.gserviceaccount.com as the service agant account. If that service account doesn't exist, Dataproc uses the Google APIs service agent account, [project-number]@cloudservices.gserviceaccount.com, for control plane operations.

Shared VPC networks: If the cluster uses a Shared VPC network, a Shared VPC Admin must grant both of the above service accounts the role of Network User for the Shared VPC host project. For more information, see:

Dataproc VM access scopes

VM Access scopes are used to grant or limit VM instances access to APIs. They work together with the VM service account to determine API access. For example, if cluster VMs are granted only the https://www.googleapis.com/auth/storage-full scope, applications running on cluster VMs can call Cloud Storage APIs, but they are not able to make requests to BigQuery even if the VM service account they are running as is granted a BigQuery role with broad permissions.

Default Dataproc VM scopes. If scopes are not specified when a cluster is created (see gcloud dataproc cluster create --scopes), Dataproc VMs have the following default set of scopes:

https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/bigtable.admin.table
https://www.googleapis.com/auth/bigtable.data
https://www.googleapis.com/auth/cloud.useraccounts.readonly
https://www.googleapis.com/auth/devstorage.full_control
https://www.googleapis.com/auth/devstorage.read_write
https://www.googleapis.com/auth/logging.write

If you specify scopes when creating a cluster, cluster VMs will have the scopes you specify and the following minimum set of required scopes (even if you don't specify them):

https://www.googleapis.com/auth/cloud.useraccounts.readonly
https://www.googleapis.com/auth/devstorage.read_write
https://www.googleapis.com/auth/logging.write

Creating a cluster with a user-managed VM Service account

You can specify a VM service account when you create a cluster. Dataproc does not support specifying or changing the VM service account after the cluster is created.

Why specify a user-managed VM service account? Service accounts have IAM roles granted to them. Specifying a user-managed VM service account when creating a Dataproc cluster allows you to create clusters with fine-grained access to and control of project resources. Using different user-managed VM service accounts with different Dataproc clusters allows you to set up clusters with different access to Cloud resources.

Before creating the cluster, create the service account within the project in which the cluster will be created. Grant the service account the Dataproc Worker role and any additional roles that will be needed by your jobs, for example, to allow reading and writing data from and to Google Cloud resources, such as BigQuery.

gcloud Command

Use the gcloud clusters create command to create a new cluster with a user-specified VM service account and VM access scopes.

gcloud dataproc clusters create cluster-name \
    --region=region \
    --service-account=service-account-name@project-id.iam.gserviceaccount.com \
    --scopes=scope[, ...]

REST API

Set the serviceAccount and serviceAccountScopes in the GceClusterConfig object as part of the clusters.create API request.

Console

Currently, setting a user-managed Dataproc VM service account in the Cloud Console is not supported. You can set the "cloud-platform" scope on the VMs in your cluster by clicking "Allow API access to all Google Cloud services in the same project" in the Project access section of the Manage security panel on the Dataproc Create a cluster page in the Cloud Console.

What's next