Stay organized with collections
Save and categorize content based on your preferences.
This page describes how service accounts are used in Cloud Data Fusion. For
more information, see Use service accounts.
Tenant and customer projects
Cloud Data Fusion sets up service accounts to access resources in the
following projects:
Tenant project
Cloud Data Fusion creates a tenant project to hold the resources and
services it needs to manage pipelines on your behalf. For example: running
pipelines on your Dataproc clusters that reside in your customer
project. A tenant project is not exposed to you, but when you create a
private instance, you might need to use the tenant project name to set up VPC
peering.
For more information, see the Service Infrastructure documentation about
tenant projects.
Customer project
You create and own this project. By default, Cloud Data Fusion creates an
ephemeral Dataproc cluster in this project to run the your
pipelines.
The following diagram shows a Cloud Data Fusion instance running in a
tenant project and a pipeline running on a Dataproc cluster in a
customer project.
Service accounts in Cloud Data Fusion
A service account provides an identity for Cloud Data Fusion, which gives
Cloud Data Fusion access to your resources.
When you enable the Cloud Data Fusion API and create a
Cloud Data Fusion instance, a service account is added to your project to
access resources like Service Networking,
Dataproc, Cloud Storage, BigQuery, Spanner,
and Bigtable. This service account is called the
Cloud Data Fusion API Service Agent.
Roles are automatically granted to this service agent.
A service account is identified by its email address, which is unique to the
account.
The following types of service accounts are used in Cloud Data Fusion. For
more information, see Types of service accounts.
The service agent, called the
Cloud Data Fusion API Service Agent, which
Cloud Data Fusion creates to gain access to customer resources so
that it can act on the customer's behalf. It is used in the tenant
project to access customer project resources. For example,
Preview runs in memory instead of in a Dataproc cluster.
The
Cloud Data Fusion API Service Agent
(roles/datafusion.serviceAgent) Identity and Access Management role assigned to the
Cloud Data Fusion Service Account by default, includes additional
permissions to ensure an optimal user experience. To enhance security, you
can create a custom role with a set of
minimum permissions
required for a task, and assign it to the Cloud Data Fusion Service
Account.
The default Compute Engine service account that
Cloud Data Fusion creates to deploy jobs that access other
Google Cloud resources. By default, it attaches to a
Dataproc cluster VM to enable Cloud Data Fusion to
access Dataproc resources during a pipeline run. In the
Cloud Data Fusion
Enterprise edition,
you can run pipelines from a user-managed service account
by creating a profile from the Cloud Data Fusion
console→System Admin→Configuration tab and adding the custom service
account. In versions 6.2.3 and later, you can choose a custom service
account to attach to the Dataproc cluster when creating a
Cloud Data Fusion instance. For more information, see
Service accounts in Dataproc.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eCloud Data Fusion uses service accounts to access resources in both tenant and customer projects, enabling it to manage pipelines on the user's behalf.\u003c/p\u003e\n"],["\u003cp\u003eThe Cloud Data Fusion API Service Agent is a service account created automatically when enabling the Cloud Data Fusion API, granting it access to resources like Service Networking, Dataproc, Cloud Storage, and others.\u003c/p\u003e\n"],["\u003cp\u003eA default Compute Engine service account is also created to deploy jobs that access other Google Cloud resources, which can attach to a Dataproc cluster VM to enable Cloud Data Fusion to access Dataproc resources during pipeline runs.\u003c/p\u003e\n"],["\u003cp\u003eIn Cloud Data Fusion Enterprise edition, pipelines can run from a user-managed service account by creating a profile in the Cloud Data Fusion console, enhancing control and customization.\u003c/p\u003e\n"],["\u003cp\u003eCustomer project is owned by the customer and is the location where the ephemeral Dataproc cluster is located in order to run the user's pipelines.\u003c/p\u003e\n"]]],[],null,["# Service accounts in Cloud Data Fusion\n\nThis page describes how service accounts are used in Cloud Data Fusion. For\nmore information, see [Use service accounts](/iam/docs/service-accounts).\n\n### Tenant and customer projects\n\nCloud Data Fusion sets up service accounts to access resources in the\nfollowing projects:\n\nTenant project\n\n: Cloud Data Fusion creates a tenant project to hold the resources and\n services it needs to manage pipelines on your behalf. For example: running\n pipelines on your Dataproc clusters that reside in your customer\n project. A tenant project is not exposed to you, but when you create a\n private instance, you might need to use the tenant project name to set up VPC\n peering.\n\n For more information, see the Service Infrastructure documentation about\n [tenant projects](/service-infrastructure/docs/glossary#tenant).\n\nCustomer project\n\n: You create and own this project. By default, Cloud Data Fusion creates an\n ephemeral Dataproc cluster in this project to run the your\n pipelines.\n\nThe following diagram shows a Cloud Data Fusion instance running in a\ntenant project and a pipeline running on a Dataproc cluster in a\ncustomer project.\n\nService accounts in Cloud Data Fusion\n-------------------------------------\n\nA service account provides an identity for Cloud Data Fusion, which gives\nCloud Data Fusion access to your resources.\n\nWhen you enable the Cloud Data Fusion API and create a\nCloud Data Fusion instance, a service account is added to your project to\naccess resources like Service Networking,\nDataproc, Cloud Storage, BigQuery, Spanner,\nand Bigtable. This service account is called the\n[Cloud Data Fusion API Service Agent](/iam/docs/understanding-roles#datafusion.serviceAgent).\nRoles are automatically granted to this service agent.\n\nA service account is identified by its email address, which is unique to the\naccount.\n\nThe following types of service accounts are used in Cloud Data Fusion. For\nmore information, see [Types of service accounts](/iam/docs/service-account-types).\n\nWhat's next\n-----------\n\n- Learn about [controlling access to data](/data-fusion/docs/access-control).\n- [Give Service Account User permissions](/data-fusion/docs/how-to/granting-service-account-permission).\n- See Cloud Data Fusion [pricing](/data-fusion/pricing)."]]