If you plan to create a Cloud Data Fusion instance with a private IP address, you can provide additional security by first establishing a security perimeter for the instance using VPC Service Controls (VPC-SC). The VPC-SC security perimeter around the private Cloud Data Fusion instance and other Google Cloud resources helps mitigate the risk of data exfiltration. For example, with VPC Service Controls, if a Cloud Data Fusion pipeline reads data from a supported resource, such as a BigQuery dataset, located within the perimeter, then tries to write the output to a resource outside the perimeter, the pipeline will fail.
Cloud Data Fusion resources are exposed on two API surfaces:
datafusion.googleapis.comcontrol plane API surface, which allows you to perform instance-level operations, such as the creation and deletion of instances.
datafusion.googleusercontent.comdata plane API surface (the Cloud Data Fusion Web UI in the Google Cloud console), which executes on a Cloud Data Fusion instance to create and execute data pipelines.
You set up VPC Service Controls with Cloud Data Fusion by restricting connectivity to both of these API surfaces.
Cloud Data Fusion pipelines are executed on Dataproc clusters. To protect a Dataproc cluster with a service perimeter, follow the instructions for setting up private connectivity to allow the cluster to function inside the perimeter.
Don't use plugins that use Google Cloud APIs that are not supported by VPC Service Controls. If you use unsupported plugins, Cloud Data Fusion will block the API calls, resulting in pipeline preview and execution failure.
To use Cloud Data Fusion within a VPC Service Controls service perimeter, add or configure several DNS entries to point the following domains to the restricted VIP (Virtual IP address):
Establish the VPC Service Controls security perimeter before creating your Cloud Data Fusion private instance. Perimeter protection for instances created prior to setting up VPC Service Controls is not supported.
Currently, the Cloud Data Fusion data plane UI does not support specifying access levels using identity based access.
Restricting Cloud Data Fusion API surfaces
Restricting the control plane surface
See Setting up private connectivity to Google APIs and services
to restrict connectivity to the
datafusion.googleapis.com API control plane
Restricting the data plane surface
To set up private connectivity to the API data plane,
configure DNS by completing the following steps for both the
Create a new private zone using Cloud DNS:
- Zone type: Check private
- Zone name: datafusiongoogleusercontentcom
- DNS name: datafusion.googleusercontent.com
- Network: Select the private IP network you chose when you created your Cloud Data Fusion instance.
From the Cloud DNS page, click your
datafusiongoogleusercontentDNS zone name to open the Zone details page. Two records are listed: an NS and an SOA record. Use ADD RECORD SET to add the following two record sets to your datafusiongoogleusercontent DNS zone.
Add a CNAME record: In the Create record set dialog, fill in the following fields to map DNS name
*.datafusion.googleusercontent.com.to the canonical name
- DNS name: "*.datafusion.googleusercontent.com"
- Canonical name: "datafusion.googleusercontent.com"
Add an A record: In a new Create record set dialog, fill in the following fields to map DNS name
datafusion.googleusercontent.com.to IP addresses
- DNS name: ".datafusion.googleusercontent.com"
datafusiongoogleusercontentZone details page shows the following record sets:
Follow the above steps to create a private DNS zone and add a record set for the