Creating a private instance

This page describes how to create a Cloud Data Fusion instance with a private IP address. You create your private Cloud Data Fusion instance in a VPC network or a shared VPC network.

Creating a private Cloud Data Fusion IP instance provides the following benefits:

  • Connections to the Cloud Data Fusion instance are established over a private VPC network in your Google Cloud project. Traffic over this network does not go through the public internet.

  • The instance can connect to your on-premises resources, such as relational databases, by connecting your on-premises network to the Google Cloud private VPC network using Cloud VPN or Cloud Interconnect. You securely access your on-premises resources, such as databases, over the private network without opening up access to Google Cloud.

Set up your VPC network

If you haven't already done so, create a VPC network or a shared VPC network.

This section shows you how to enable Private Google Access and allocate an IP range, both required for setting up your VPC network.

Enable Private Google Access

The region in which you create your private Cloud Data Fusion instance must have a subnet with Private Google Access enabled.

To enable Private Google Access for your subnet, follow these steps:

  1. Go to the VPC networks page in the Google Cloud Console.

    Open the VPC networks page

  2. In the Region column, find the region where you'd like to create your private Cloud Data Fusion instance. Click the subnet for that region.

  3. Click Edit.

  4. Under Private Google access, select On.

  5. Click Save.

    image

Allocate an IP range

Follow these steps only if you're using a shared VPC network. If you're not using a shared VPC network, Cloud Data Fusion will allocate an IP range automatically when you create your instance. If you're not using a shared VPC network, skip this section and go to Create a private instance.

To allocate an IP range for your Cloud Data Fusion instance, follow these steps:

  1. Go to the VPC networks page in the Cloud Console.

    Open the VPC networks page

  2. Under Name, click the VPC network in which you want to create a private Cloud Data Fusion instance.

  3. On the VPC network details page, click the Private service connection tab. If prompted, enable the Service Networking API by clicking Enable API.

    image

  4. Click Allocate IP range.

    1. Give your IP range a name.

    2. Under IP range, select Automatic.

    3. Specify a prefix size of 22.

    4. Click Allocate.

      image

Create a private instance

Create a private Cloud Data Fusion instance either in a VPC network or a shared VPC network. To create your instance in a VPC network, use either the Cloud Console or cURL. To create your instance in a shared VPC network, use cURL.

Create a private instance in a VPC network

If you use the Cloud Console to create your private instance, Cloud Data Fusion will automatically allocate the /22 IP address range for you. If you'd rather provide an explicit IP allocation of your choice, use the cURL command instead.

Console

If the API is enabled, the Cloud Data Fusion section in the Cloud Console shows an Instances page where you can manage your Cloud Data Fusion instances. When no instances exist, the page has a link to create an instance, along with some useful links to documentation and samples.

  1. Go to the Create instance page in the Cloud Console.

    Open the Create instance page

  2. Enter an Instance name.

  3. Select the Region in which to create the instance, the region for which you enabled Private Google Access

  4. Select the Cloud Data Fusion Edition you'd like.

  5. Click Advanced Options. Under Private IP, select Enable Private IP.

  6. Under Associated networking, select a network in which to create your private instance.

  7. Click Create. It takes up to 30 minutes for the instance creation process to complete.

cURL

For your convenience, you can export the following variables, or you can directly substitute these values into the commands below.

export PROJECT=PROJECT_ID
export LOCATION=REGION
export DATA_FUSION_API_NAME=datafusion.googleapis.com

To create a Cloud Data Fusion instance with the REST API, submit the following create API request.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1beta1/projects/$PROJECT/locations/$LOCATION/instances?instance_id=instance_id -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "network", "ipAllocation": "ip_range"}}'
Parameter Description
instance_id Provide an ID for your instance.
network The name of the VPC network in which you want to create your private instance.
ip_range The IP range you allocated. (You can find your IP range by going to the VPC network details page in the Cloud Console, in the Private service connection tab, under Internal IP range.)

Create a private instance in a shared VPC network

For your convenience, you can export the following variables. Alternatively, you can directly substitute these values in the commands below.

export PROJECT=PROJECT_ID
export LOCATION=REGION
export DATA_FUSION_API_NAME=datafusion.googleapis.com

To create a Cloud Data Fusion instance with the REST API, submit the following create API request.

  curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1beta1/projects/$PROJECT/locations/$LOCATION/instances?instanceId=instance_id -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "projects/shared_vpc_host_project_id/global/networks/network", "ipAllocation": "ip_range"}}'
  

Parameter Description
instance_id Provide an ID for your instance.
shared_vpc_host_project_id The ID of the project that's hosting the shared VPC network .
network The name of the VPC network in which you want to create your private instance.
ip_range The IP range you allocated. (You can find your IP range on the VPC network details page in the Cloud Console, in the Private service connection tab, under Internal IP range.)

Set up VPC Network Peering

Cloud Data Fusion uses VPC Network Peering to establish network connectivity to your VPC network. This allows Cloud Data Fusion to access resources on your network through private IP addresses.

This section shows you how to create a peering configuration between your network and the Cloud Data Fusion tenant project network.

Find your tenant project ID

To create a peering configuration, you need your tenant project ID.

  1. Go to the Cloud Data Fusion Instances page in the Cloud Console.

    Open the Instances page

  2. Under Instance Name, select your instance.

  3. On the Instance details page, copy your instance's Service Account value. The tenant project ID is the portion between the "at" symbol (@) and the following period (.). For example, if the service account value is
    cloud-datafusion-management-sa@r8170c9b5e7699803-tp.iam.gserviceaccount.com
    then the tenant project ID is r8170c9b5e7699803-tp.

    image

Create a peering connection

  1. Go to the VPC network peering page in the Cloud Console.

    Open the VPC network peering page

  2. Click Create connection.

  3. Click Continue.

  4. Enter a Name for your peering connection.

  5. Under Your VPC network, select the network in which you created your Cloud Data Fusion instance.

  6. Under Peered VPC network, select In another project.

  7. Under project ID enter the tenant project ID you found previously in this tutorial.

  8. Under VPC network name, enter instance_region-instance_id.

    • instance_region is the region in which you created your Cloud Data Fusion instance.
    • instance_id is the ID of your Cloud Data Fusion instance.
  9. Click Exchange custom routes. Select Export custom routes. This allows for exchanging any custom routes defined in your VPC network with the tenant VPC network.

  10. Click Create.

    image

Set up IAM permissions

Follow these steps only if you're using a shared VPC network. If you're not using a shared VPC network, skip this section and go to Create a firewall rule.

If you create your Cloud Data Fusion instance in a shared VPC network, you need to grant the Compute Network User role on the shared VPC host project access to the following service accounts:

  • Cloud Data Fusion service account: service-project-number@gcp-sa-datafusion.iam.gserviceaccount.com
  • Dataproc service account: service-project-number@dataproc-accounts.iam.gserviceaccount.com

project-number is the Cloud Console project number to which your Cloud Data Fusion instance belongs.

Follow these steps to grant access to the required service accounts.

Create a firewall rule

Create a firewall rule on your VPC network that allows for incoming SSH connections from the IP range you specified when you created your private Cloud Data Fusion instance.

You can create the firewall rule by using the Cloud Console or by using gcloud. To use gcloud, run the following command:

  gcloud compute firewall-rules create name-allow-ssh --allow=tcp:22 --source-ranges=ip_range --network=network --project=project
  

Parameter Description
name Name of the firewall rule to create.
ip_range The IP range you allocated. (You can find your IP range in the VPC network details page in the Cloud Console, in the Private service connection tab, under Internal IP range.)
network The network to which this rule is attached. The name of the VPC network in which you created your private instance.
project The ID of the project that's hosting the VPC network.


You can now use your private Cloud Data Fusion instance.

What's next