Cloud Data Fusion service accounts

Cloud Data Fusion sets up service accounts to access resources in the following projects:

  1. Tenant project: Cloud Data Fusion creates and owns this project. When a customer creates a Cloud Data Fusion instance, it runs in this project.
  2. Customer project: The customer creates and owns this project. By default, Cloud Data Fusion creates an ephemeral Dataproc cluster in this project to run the customer's pipelines.
Deploy the pipeline.

Service account types

Service account Description
tenant-project-id-compute@developer.gserviceaccount.com Used by the GKE cluster in the tenant project to access resources, such as Cloud Storage and Cloud SQL.
cloud-datafusion-management-sa@tenant-project- id.iam.gserviceaccount.com (Deprecated.) This is output as an API field for use in peering (see Set up VPC network peering).
service-customer-project-number@gcp-sa- datafusion.iam.gserviceaccount.com Used in the tenant project to access customer project resources, for example, to access resources during Preview, from Wrangler, and to create the Dataproc cluster in the customer's project.
customer-project-number- compute@developer.gserviceaccount.com By default, used by Cloud Data Fusion to create Dataproc clusters and execute pipelines in Dataproc clusters (see Service accounts in Dataproc). Cloud Data Fusion Enterprise edition customers can run pipelines from a different service account by creating a profile from the Cloud Data Fusion UI: System Admin > Configuration tab and adding the custom service account. A custom service account must be granted the Service Account User role.

What's next