This page describes how to grant the Dataproc Service Account User role to Cloud Data Fusion to allow it to provision and run pipelines on Dataproc clusters.
Whether you use a user-managed service account, or the default Compute Engine service account on the virtual machines in a cluster, you must grant the Service Account User role to Cloud Data Fusion. Otherwise, Cloud Data Fusion cannot provision a Dataproc cluster and the following error appears when you execute a data pipeline:
PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account '[service-account-name]'
Copy the Cloud Data Fusion service account
- In the Google Cloud Console, go to the Identity and Access Management page.
Go to the IAM page
- From the project selector at the top of the page, choose the project, folder, or organization to which the Cloud Data Fusion instance belongs.
- Find and copy the Cloud Data Fusion service account
. Use the following format:
Grant service account user permission
- In the Cloud Console, go to the Service Accounts page.
Go to the Service Accounts page
- Click Select a project, choose a project where the service account you want to use for the Dataproc cluster is located, and then click Open.
- Select the checkbox by the Dataproc service account.
When Cloud Data Fusion provisions a Dataproc cluster,
you can specify which user-managed service account to use for the
Dataproc virtual machines in that cluster. If a service
account is not specified, the default Compute Engine service account is
used, which is in the format of
- If the info panel is not already visible, click Show info panel. The panel displays a list of members and roles that have been granted on the service account.
- Click Add Member.
- In the New members field, paste the Cloud Data Fusion service account that you previously copied.
Select the Service Account User role.