Creating a private instance

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to create a Cloud Data Fusion instance with an internal IP address. You create the instance in a VPC network or a shared VPC network.

A private Cloud Data Fusion instance has the following benefits:

  • Connections to the instance are established over a private VPC network in your Google Cloud project. Traffic over the network doesn't go through the public internet.

  • The instance can connect to your on-premises resources, such as relational databases because your on-premises network connects to the Google Cloud private VPC network through Cloud VPN or Cloud Interconnect. You can securely access your on-premises resources, such as databases, over the private network without opening up access to Google Cloud.

Set up the VPC network

If you haven't already done so, create a VPC network or a shared VPC network.

To set up your VPC network, you must enable Private Google Access and allocate an IP range.

Enable Private Google Access

The region where you create your private Cloud Data Fusion instance must have a subnet with Private Google Access enabled.

To enable Private Google Access for the subnet, see Private Google Access configuration.

Allocate an IP range

VPC network

If you're not using a shared VPC network, Cloud Data Fusion allocates an IP range by default when you create an instance.

Shared VPC network

To use a shared VPC network, you must allocate an IP range for your Cloud Data Fusion instance. Your pipeline runs on a Dataproc cluster that uses a different IP range than the one allocated to the Cloud Data Fusion instance.

To allocate an IP range for your Cloud Data Fusion instance, follow these steps:

  1. In the Google Cloud console, go to the VPC networks page.

    Go to VPC networks

  2. In the Name column, click the VPC network in which you want to create a private Cloud Data Fusion instance.

    The VPC network details page opens.

  3. Click Private service connection. If prompted, enable the Service Networking API by clicking Enable API.

    Configure VPC network details.

  4. Click Allocate IP range.

    1. Give your IP range a name.

    2. For IP range, click Automatic.

    3. Specify a prefix size of 22.

    4. Click Allocate.

      Allocate an IP range.

Create a private instance

Create the private Cloud Data Fusion instance in either a VPC network or a shared VPC network.

VPC network

To create the instance in a VPC network, use either the Google Cloud console or cURL.

If you use the Google Cloud console to create your private instance, Cloud Data Fusion allocates the /22 IP address range by default. To choose a different IP range, you must use the cURL command.

Console

  1. Go to the Create Data Fusion instance page.

    Go to Create Data Fusion instance

  2. Enter an instance name and description for your instance.

  3. Select the Region in which to create the instance. The region must have Private Google Access enabled.

  4. Select a Cloud Data Fusion Version and Edition.

  5. In Cloud Data Fusion versions 6.2.3 and later, specify the Dataproc service account to use for running your Cloud Data Fusion pipeline in Dataproc. The default Compute Engine account is pre-selected.

  6. Expand the Advanced Options menu and click Enable Private IP.

  7. In the Network field, choose a network in which to create the instance.

  8. Click Create. It takes up to 30 minutes for the instance creation process to complete.

cURL

For your convenience, you can export the following variables, or you can directly substitute these values into the following commands:

export PROJECT=PROJECT_ID
export LOCATION=REGION
export DATA_FUSION_API_NAME=datafusion.googleapis.com

To create the instance, call its create() method:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1/projects/$PROJECT/locations/$LOCATION/instances?instance_id=INSTANCE_ID -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "NETWORK_NAME", "ipAllocation": "IP_RANGE"}}'

Replace the following:

  • INSTANCE_ID: The ID string of your instance.
  • NETWORK_NAME: The name of the VPC network in which you want to create your private instance.
  • IP_RANGE: The IP range that you allocated. To find the IP range in the Google Cloud console, go to VPC network details > Private service connection > Internal IP range .

Shared VPC network

To create your instance in a shared VPC network, use cURL, not the Google Cloud console.

cURL

For your convenience, you can export the following variables. Alternatively, you can directly substitute these values in the following commands:

export PROJECT=PROJECT_ID
export LOCATION=REGION
export DATA_FUSION_API_NAME=datafusion.googleapis.com

To create the instance, call its create() method:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://$DATA_FUSION_API_NAME/v1/projects/$PROJECT/locations/$LOCATION/instances?instanceId=INSTANCE_ID -X POST -d '{"description": "Private CDF instance created through REST.", "type": "ENTERPRISE", "privateInstance": true, "networkConfig": {"network": "projects/SHARED_VPC_HOST_PROJECT_ID/global/networks/NETWORK_NAME", "ipAllocation": "IP_RANGE"}}'

Replace the following:

  • INSTANCE_ID: The ID string of your instance.
  • SHARED_VPC_HOST_PROJECT_ID: The ID of the project that's hosting the shared VPC network.
  • NETWORK_NAME: The name of the VPC network in which you want to create the private instance.
  • IP_RANGE: The IP range that you allocated. To find the IP range in the Google Cloud console, go to the VPC network details page > Private service connection > Internal IP range .

Set up VPC Network Peering

Cloud Data Fusion uses VPC Network Peering to establish network connectivity to your VPC or shared VPC network. This allows Cloud Data Fusion to access resources on your network through internal IP addresses.

This section describes how to create a peering configuration between your network and the Cloud Data Fusion tenant project network.

Connect to an external source

To connect to a resource in an external network (an on-premises network or another VPC network), the external network and Cloud Data Fusion instance must be connected through the same VPC network.

The following describes how to connect an external network to the Cloud Data Fusion VPC network using Cloud VPN tunnels with BGP routing or VLAN attachments:

  • Ensure your VPC network is connected to the external network using a Cloud VPN tunnel or a VLAN attachment for Dedicated Interconnect or Partner Interconnect.
  • Ensure the BGP sessions on the Cloud Routers managing your Cloud VPN tunnels or VLAN attachments have received specific prefixes (destinations) from your external network.

    Default routes (destination 0.0.0.0/0) cannot be imported into the Cloud Data Fusion VPC network because that network has its own local default route. Local routes for a destination are always used, even though the Cloud Data Fusion peering is configured to import custom routes from your VPC network.

  • Identify the peering connections produced by the private services connection. Depending on the service, the private services connection might create one or more of the following peering connections, but not necessarily all of them:
    • datafusion-googleapis-com
    • servicenetworking-googleapis-com
  • Update all of the peering connections to enable Export custom routes.
  • Identify the allocated range used by the private services connection.
  • Create a Cloud Router custom route advertisement for the allocated range on the Cloud Routers managing BGP sessions for your Cloud VPN tunnels or VLAN attachments.

Get the tenant project ID

To create a peering configuration, you need the tenant project ID.

  1. Go to the Cloud Data Fusion Instances page.

    Go to Instances

  2. In the Instance Name column, select the instance.

  3. On the Instance details page, copy your instance's Service Account value. The tenant project ID is the portion between the "at" symbol (@) and the following period (.). For example, if the service account value is
    cloud-datafusion-management-sa@r8170c9b5e7699803-tp.iam.gserviceaccount.com,
    the tenant project ID is r8170c9b5e7699803-tp.

    Get the tenant project ID.

Create a peering connection

  1. Go to the VPC network peering page.

    Go to VPC network peering

  2. Click Create peering connection.

  3. Click Continue.

  4. Enter a Name for your peering connection.

  5. For Your VPC network, select the network in which you created your Cloud Data Fusion instance.

  6. For Peered VPC network, select In another project.

  7. For Project ID enter the tenant project ID you found previously in this tutorial.

  8. For VPC network name, enter INSTANCE_REGION-INSTANCE_ID.

    • INSTANCE_REGION is the region in which you created your Cloud Data Fusion instance.
    • INSTANCE_ID is the ID of your Cloud Data Fusion instance.
  9. Click Exchange custom routes. Select Export custom routes. This allows for exchanging any custom routes defined in your VPC network with the tenant VPC network.

  10. Click Create.

Create VPC network peering connection.

Set up IAM permissions

VPC network

Skip this step and go to Create a firewall rule.

Shared VPC network

If you create your Cloud Data Fusion instance in a shared VPC network, you must grant the Compute Network User role to the following service accounts. To give permissions to all subnets, grant the role to the shared VPC host project.

To further control access, instead grant the role to a specific subnet, and the Network Viewer role on the host project.

  • Cloud Data Fusion service account: service-PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com
  • Dataproc service account: service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com

PROJECT_NUMBER is the Google Cloud project number to which your Cloud Data Fusion instance belongs.

For more information, see Granting access to the required service accounts.

Create a firewall rule

Create a firewall rule on your VPC network that allows for incoming SSH connections from the IP range you specified when you created your private Cloud Data Fusion instance.

You can create the firewall rule by using the Google Cloud console or by using gcloud CLI. To use gcloud, run the following command:

Console

See Creating firewall rules.

gcloud

Run the following command:

gcloud compute firewall-rules create FIREWALL_NAME-allow-ssh --allow=tcp:22 --source-ranges=IP_RANGE --network=NETWORK_NAME --project=PROJECT_ID

Replace the following:

  • FIREWALL_NAME: The name of the firewall rule to create.
  • IP_RANGE: The IP range you allocated.
  • NETWORK_NAME: The name of the network to which the firewall rule is attached. It's the name of the VPC network in which you created the private instance.
  • PROJECT_ID: The ID of the project that's hosting the VPC network.

You can now use your private Cloud Data Fusion instance.

What's next