Service accounts

This page describes service accounts and how they can be used with Dataproc.

What are service accounts?

A service account is a special account that can be used by services and applications running on your Compute Engine instance to interact with other Google Cloud APIs. Applications can use service account credentials to authorize themselves to a set of APIs and perform actions within the permissions granted to the service account and virtual machine instance.

When created, Compute Engine virtual machines can be configured to use a specific service account. If a service account is not specified, a default service account is used. For more information, review the Compute Engine service account documentation.

Service accounts in Dataproc

Dataproc clusters are built on top of Compute Engine virtual machines. Specifying a user-managed service account when creating a Dataproc cluster allows you to use that service account for Dataproc virtual machines in that cluster. If a service account is not specified, Dataproc virtual machines will use the default Google-managed Compute Engine service account [project-number]

Why specify a service account?

Service accounts have IAM roles granted to them. Specifying a user-managed service account when creating a Dataproc cluster allows you to create and utilize clusters with fine-grained access and control to Cloud resources. Using multiple user-managed service accounts with different Dataproc clusters allows for clusters with different access to Cloud resources.

Service account requirements and Limitations

  • Service accounts can only be set when a cluster is created.
  • You need to create a service account before creating the Dataproc cluster that will be associated with the service account.
  • Once set, the service account used for a cluster cannot be changed.
  • Make sure that service accounts have appropriate IAM roles for your needs.
  • Service accounts used with Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).
  • Service accounts must reside within the project the cluster will be created in.
  • Compute Engine virtual machines used in Dataproc clusters still need specific access scopes. Access scopes are also limited to the service to which they apply. For example, if a Dataproc cluster been granted only the scope for Cloud Storage, then it can't use the same scope to make requests to BigQuery.

Default and minimum scopes

If service account scopes are not specified, Dataproc uses the following default set of scopes:
If custom scopes are specified, Dataproc uses the combination of the user-specified scopes and the following minimum set of Dataproc-required scopes:

Using service accounts

You can specify a user-managed service account when you create a new Dataproc cluster via a Dataproc API clusters.create request or using the Cloud SDK gcloud command-line tool.

gcloud Command

Use the gcloud clusters create command to create a new cluster with a user-specified service account and access scopes.
gcloud dataproc clusters create cluster-name \
    --region=region \ \
    --scopes=scope[, ...]


You can set the serviceAccount and serviceAccountScopes in the GceClusterConfig object as part of the clusters.create API request.


Dataproc support for setting user-managed service accounts in the Cloud Console will be added in a future release.

What's next