This page describes how to create a Cloud Data Fusion instance with a private IP address. You create your private Cloud Data Fusion instance in a VPC network or a shared VPC network.
Creating a private Cloud Data Fusion IP instance provides the following benefits:
Connections to the Cloud Data Fusion instance are established over a private VPC network in your Google Cloud project. Traffic over this network does not go through the public internet.
The instance can connect to your on-premises resources, such as relational databases, by connecting your on-premises network to the Google Cloud private VPC network using Cloud VPN or Cloud Interconnect. You securely access your on-premises resources, such as databases, over the private network without opening up access to Google Cloud.
Set up your VPC network
If you haven't already done so, create a VPC network or a shared VPC network.
This section shows you how to enable Private Google Access and allocate an IP range, both required for setting up your VPC network.
Enable Private Google Access
The region in which you create your private Cloud Data Fusion instance must have a subnet with Private Google Access enabled.
To enable Private Google Access for your subnet, follow these steps:
Go to the VPC networks page in the Google Cloud console.
In the Region column, find the region where you'd like to create your private Cloud Data Fusion instance. Click the subnet for that region.
Click Edit.
Under Private Google access, select On.
Click Save.
Allocate an IP range
These steps show you how to allocate an IP range for your Cloud Data Fusion instance. Note that your pipeline runs on a Dataproc cluster, which uses an IP range different than the one allocated to your Cloud Data Fusion instance.
Follow these steps only if you're using a shared VPC network. If you're not using a shared VPC network, Cloud Data Fusion will allocate an IP range automatically when you create your instance. If you're not using a shared VPC network, skip this section and create a private instance.
To allocate an IP range for your Cloud Data Fusion instance, follow these steps:
Go to the VPC networks page in the Cloud console.
Under Name, click the VPC network in which you want to create a private Cloud Data Fusion instance.
On the VPC network details page, click the Private service connection tab. If prompted, enable the Service Networking API by clicking Enable API.
Click Allocate IP range.
Give your IP range a name.
Under IP range, select Automatic.
Specify a prefix size of
22
.Click Allocate.
Create a private instance
Create a private Cloud Data Fusion instance either in a VPC network or a shared VPC network. To create your instance in a VPC network, use either the Cloud console or cURL. To create your instance in a shared VPC network, use cURL.
Create a private instance in a VPC network
If you use the Cloud console to create your private instance, Cloud Data Fusion will automatically allocate the /22 IP address range for you. If you'd rather provide an explicit IP allocation of your choice, use the cURL command instead.
Console
If the API is enabled, the Cloud Data Fusion section in the Cloud console shows an Instances page where you can manage your Cloud Data Fusion instances. When no instances exist, the page has a link to create an instance, along with some useful links to documentation and samples.
Go to the Create instance page in the Cloud console.
Enter an Instance name.
Enter a Description for your instance.
Select the Region in which to create the instance, the region for which you enabled Private Google Access
Specify the Cloud Data Fusion Version you prefer.
Select the Cloud Data Fusion Edition you'd like.
In Cloud Data Fusion version 6.2.3 and higher, specify the Dataproc service account to use for running your Cloud Data Fusion pipeline in Dataproc. The UI pre-selects the default Compute Engine account. Regardless of the version, make sure that the service account has appropriate Identity and Access Management roles for your needs. For more information, see Granting service account user permission.
Click Advanced Options. Under Private IP, select Enable Private IP.
Under Associated networking, select a network in which to create your private instance.
Click Create. It takes up to 30 minutes for the instance creation process to complete.
cURL
For your convenience, you can export the following variables, or you can directly substitute these values into the commands below.
export PROJECT=PROJECT_ID export LOCATION=REGION export DATA_FUSION_API_NAME=datafusion.googleapis.com
To create a Cloud Data Fusion instance with the REST API,
submit the following create
API request.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1/projects/$PROJECT/locations/$LOCATION/instances?instance_id=instance_id -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "network", "ipAllocation": "ip_range"}}'
Parameter | Description |
---|---|
instance_id |
Provide an ID for your instance. |
network |
The name of the VPC network in which you want to create your private instance. |
ip_range |
The IP range you allocated. (You can find your IP range by going to the VPC network details page in the Cloud console, in the Private service connection tab, under Internal IP range.) |
Create a private instance in a shared VPC network
For your convenience, you can export the following variables. Alternatively, you can directly substitute these values in the commands below.
export PROJECT=PROJECT_ID export LOCATION=REGION export DATA_FUSION_API_NAME=datafusion.googleapis.com
To create a Cloud Data Fusion instance with the REST API,
submit the following create
API request.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1/projects/$PROJECT/locations/$LOCATION/instances?instanceId=instance_id -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "projects/shared_vpc_host_project_id/global/networks/network", "ipAllocation": "ip_range"}}'
Parameter | Description |
---|---|
instance_id |
Provide an ID for your instance. |
shared_vpc_host_project_id |
The ID of the project that's hosting the shared VPC network . |
network |
The name of the VPC network in which you want to create your private instance. | ip_range |
The IP range you allocated. (You can find your IP range on the VPC network details page in the Cloud console, in the Private service connection tab, under Internal IP range.) |
Set up VPC Network Peering
Cloud Data Fusion uses VPC Network Peering to establish network connectivity to your VPC network. This allows Cloud Data Fusion to access resources on your network through private IP addresses.
This section shows you how to create a peering configuration between your network and the Cloud Data Fusion tenant project network.
Connecting to an external source
To connect to a resource in an external network (an on-premises network or another VPC network), the external network and Cloud Data Fusion instance need to be connected via the same VPC network.
The following describes how to connect an external network to the Cloud Data Fusion VPC network using Cloud VPN tunnels with BGP routing or Cloud Interconnect attachments:
- Ensure your VPC network is connected to the external network using a Cloud VPN tunnel or a VLAN attachment for Dedicated Interconnect or Partner Interconnect.
- Ensure the BGP sessions on the Cloud Routers managing your
Cloud VPN tunnels or Cloud Interconnect attachments
(VLANs) have received specific prefixes (destinations) from your external
network.
Default routes (destination 0.0.0.0/0) cannot be imported into the Cloud Data Fusion VPC network because that network has its own local default route. Local routes for a destination are always used, even though the Cloud Data Fusion peering is configured to import custom routes from your VPC network.
-
Identify the peering connections produced by the private services
connection. Depending on the service, the private services connection might
create one or more of the following peering connections, but not necessarily all
of them:
datafusion-googleapis-com
servicenetworking-googleapis-com
- Update all of the peering connections to enable Export custom routes.
- Identify the allocated range used by the private services connection.
- Create a Cloud Router custom route advertisement for the allocated range on the Cloud Routers managing BGP sessions for your Cloud VPN tunnels or Cloud Interconnect attachments (VLANs).
Find your tenant project ID
To create a peering configuration, you need your tenant project ID.
Go to the Cloud Data Fusion Instances page in the Cloud console.
Under Instance Name, select your instance.
On the Instance details page, copy your instance's Service Account value. The tenant project ID is the portion between the "at" symbol (@) and the following period (.). For example, if the service account value is
cloud-datafusion-management-sa@r8170c9b5e7699803-tp.iam.gserviceaccount.com
then the tenant project ID isr8170c9b5e7699803-tp
.
Create a peering connection
Go to the VPC network peering page in the Cloud console.
Click Create peering connection.
Click Continue.
Enter a Name for your peering connection.
Under Your VPC network, select the network in which you created your Cloud Data Fusion instance.
Under Peered VPC network, select In another project.
Under project ID enter the tenant project ID you found previously in this tutorial.
Under VPC network name, enter instance_region-instance_id.
- instance_region is the region in which you created your Cloud Data Fusion instance.
- instance_id is the ID of your Cloud Data Fusion instance.
Click Exchange custom routes. Select Export custom routes. This allows for exchanging any custom routes defined in your VPC network with the tenant VPC network.
Click Create.
Set up IAM permissions
Follow these steps only if you're using a shared VPC network. If you're not using a shared VPC network, skip this section and go to Create a firewall rule.
If you create your Cloud Data Fusion instance in a shared VPC network, you need to grant the Compute Network User role on the shared VPC host project access to the following service accounts:
- Cloud Data Fusion service account:
service-project-number@gcp-sa-datafusion.iam.gserviceaccount.com
- Dataproc service account:
service-project-number@dataproc-accounts.iam.gserviceaccount.com
project-number
is the Cloud console project number to which your
Cloud Data Fusion instance belongs.
Follow these steps to grant access to the required service accounts.
Create a firewall rule
Create a firewall rule on your VPC network that allows for incoming SSH connections from the IP range you specified when you created your private Cloud Data Fusion instance.
You can create the firewall rule by using the
Cloud console or by
using gcloud. To use
gcloud
, run the following command:
gcloud compute firewall-rules create name-allow-ssh --allow=tcp:22 --source-ranges=ip_range --network=network --project=project
Parameter | Description |
---|---|
name |
Name of the firewall rule to create. |
ip_range |
The IP range you allocated. (You can find your IP range in the VPC network details page in the Cloud console, in the Private service connection tab, under Internal IP range.) |
network |
The network to which this rule is attached. The name of the VPC network in which you created your private instance. |
project |
The ID of the project that's hosting the VPC network. |
You can now use your private Cloud Data Fusion instance.
What's next
- Learn about other key Cloud Data Fusion concepts and features.
- See Cloud Data Fusion pricing.