The Cloud Data Fusion web UI supports authentication mechanisms supported by Google Cloud Console, with access controlled through Identity and Access Management.
You can create a private Cloud Data Fusion instance, which can be peered with their VPC network. Private Cloud Data Fusion instances have a private IP address, and are not exposed to the public internet. Additional security is available using VPC Service Controls to establish a security perimeter around a Cloud Data Fusion private instance.
For more information, see the Cloud Data Fusion networking overview.
Pipeline execution on pre-created private IP Dataproc clusters
You can use a private Cloud Data Fusion instance with the remote Hadoop provisioner. The Dataproc cluster must be on the VPC network peered with Cloud Data Fusion. The remote Hadoop provisioner is configured with the private IP address of the master node of the Dataproc cluster.
Managing access to the Cloud Data Fusion instance: Cloud Data Fusion only supports managing access at an instance level. If you have access to an instance, you have access to all pipelines and metadata in that instance.
Pipeline access to your data: Pipeline access to data is provided by granting access to the service account, which can be a custom service account that you specify.
End user access to the Cloud Data Fusion resources
Cloud Data Fusion resources are created in Google-owned tenant projects. Cloud Data Fusion does not provide access to underlying Cloud Data Fusion VM instances and resources in tenant projects.
For a pipeline execution, you control ingress and egress by setting the appropriate firewall rules on the customer VPC on which the pipeline is being executed.
For more information, see Firewall rules.
You can store passwords, keys, and other data securely in the Cloud Key Management Service. At runtime, Cloud Data Fusion calls Cloud Key Management Service to retrieve the keys.
By default, data is encrypted at rest using Google-managed encryption keys, and in transit using TLS v1.2. You use customer-managed encryption keys (CMEK) to control the data written by Cloud Data Fusion pipelines, including Dataproc cluster metadata and Cloud Storage, BigQuery, and Pub/Sub data sources and sinks.
Cloud Data Fusion pipelines execute in Dataproc clusters in the customer project, and can be configured to run using a customer-specified (custom) service account. A custom service account must be granted the Service Account User role.
Cloud Data Fusion services are created in Google-managed tenant projects that users can't access. Cloud Data Fusion pipelines execute on Dataproc clusters inside customer projects. Customers can access these clusters during their lifetime.
Cloud Data Fusion audit logs are available from Logging.