Cloud Data Fusion service accounts

Cloud Data Fusion sets up service accounts to access resources in the following projects:

  1. Tenant project: Cloud Data Fusion creates and owns this project. When a customer creates a Cloud Data Fusion instance, it runs in this project.
  2. Customer project: The customer creates and owns this project. By default, Cloud Data Fusion creates an ephemeral Dataproc cluster in this project to run the customer's pipelines.
Deploy the pipeline.

Service account table

Service account Description
cloud-control2-datafusion@system.gserviceaccount.com Attached to Cloud Data Fusion instances in the tenant project. This service account is responsible for the project's control plane, and deploys Cloud Data Fusion across both the tenant and customer project.
tenant-project-id-compute@developer.gserviceaccount.com Used by the GKE cluster in the tenant project to access resources, such as Cloud Storage and Cloud SQL.
cloud-datafusion-management-sa@tenant-project-id.iam.gserviceaccount.com Currently, this service account is not created or used. It is output as an API field for use in peering—see Set up VPC Network Peering.
service-tenant-project-number@gcp-sa-datafusion.iam.gserviceaccount.com Used in the tenant project to access customer project resources, for example, to access resources during Preview, from Wrangler, and to create the Dataproc cluster in the customer's project.
customer-project-number-compute@developer.gserviceaccount.com By default, used by Dataproc cluster VMs to access resources during a pipeline run (see Service accounts in Dataproc). Cloud Data Fusion Enterprise edition customers can run pipelines from a different service account by creating a profile from the Cloud Data Fusion UI→System Admin→Configuration tab and adding the custom service account.
Deploy the pipeline.

What's next