Dataproc Service Account Based Secure Multi-tenancy

Dataproc Service Account based Secure Multi-tenancy (called "secure multi-tenancy", below) enables you to share a cluster with multiple users, with a set of users mapped to service accounts when the cluster is created. With secure multi-tenancy, users can submit interactive workloads to the cluster with isolated user identities.

When a user submits a job to the cluster, the job:

  • runs as a specific OS user with a specific Kerberos principal

  • accesses Google Cloud resources using the mapped service account credentials

Considerations and Limitations

When you create a cluster with secure multi-tenancy enabled:

  • The cluster is available only to users with mapped service accounts. For example, unmapped users cannot run jobs on the cluster.

  • The Dataproc Component Gateway is not enabled.

  • Direct SSH access to the cluster and Compute Engine features, such as the ability to run startup scripts on cluster VMs, are blocked. Also, jobs cannot run with sudo privileges.

  • Kerberos is enabled and configured on the cluster for secure intra-cluster communication.

  • Dataproc Workflows are not supported.

Creating a secure multi-tenancy cluster

To create a Dataproc secure multi-tenancy cluster, use the --secure-multi-tenancy-user-mapping flag to specify a list of user-to-service-account mappings.

Example:

The following command creates a cluster, with user bob@my-company.com mapped to service account service-account-for-bob@iam.gserviceaccount.com and user alice@my-company.com mapped to service account service-account-for-alice@iam.gserviceaccount.com.

gcloud dataproc clusters create my-cluster \
    --secure-multi-tenancy-user-mapping="bob@my-company.com:service-account-for-bob@iam.gserviceaccount.com,alice@my-company.com:service-account-for-alice@iam.gserviceaccount.com" \
    --scopes=https://www.googleapis.com/auth/iam \
    --service-account=cluster-service-account@iam.gserviceaccount.com \
    --region=region \
    other args ...

Alternatively, you can store the list of user-to-service-account mappings in a local or Cloud Storage YAML or JSON file. Use the --identity-config-file flag to specify the file location.

Sample identity config file:

user_service_account_mapping:
  bob@my-company.com: service-account-for-bob@iam.gserviceaccount.com
  alice@my-company.com: service-account-for-alice@iam.gserviceaccount.com

Sample command to create the cluster using the --identity-config-file flag:

gcloud dataproc clusters create my-cluster \
    --identity-config-file=local or "gs://bucket" /path/to/identity-config-file \
    --scopes=https://www.googleapis.com/auth/iam \
    --service-account=cluster-service-account@iam.gserviceaccount.com \
    --region=region \
    other args ...

Notes:

  • As shown in the above commands, cluster --scopes must include at least https://www.googleapis.com/auth/iam, which is necessary for the cluster service account to perform impersonation.

  • The cluster service account must have permissions to impersonate the service accounts mapped to the users (see Managing service account impersonation).

  • Recommendation: Use different cluster service accounts for different clusters to allow each cluster service account to impersonate only a limited, intended group of mapped user service accounts.