Create a private instance with Private Service Connect

This page describes how to configure Private Service Connect in Cloud Data Fusion.

About Private Service Connect in Cloud Data Fusion

Cloud Data Fusion instances might need to connect to resources located on-premises, on Google Cloud, or on other cloud providers. When using Cloud Data Fusion with internal IP addresses, connections to external resources are established over a Virtual Private Cloud (VPC) network in your Google Cloud project. Traffic over the network doesn't go through the public internet. When Cloud Data Fusion is provided access to your VPC network using VPC peering, there are limitations, which become apparent when you use large-scale networks.

With Private Service Connect interfaces, Cloud Data Fusion connects to your VPC without the use of VPC peering. Private Service Connect interface is a type of Private Service Connect that provides a way for Cloud Data Fusion to initiate private and secure connections to consumer VPC networks. This not only provides the flexibility and ease of access (like VPC peering), but also provides explicit authorization and consumer-side control that Private Service Connect offers.

The following diagram shows how Private Service Connect interface is deployed in Cloud Data Fusion:

Deployment of Private Service Connect interface.

Figure 1. Deployment of Private Service Connect interface

Description of Figure 1:

  • The virtual machines (VM) running Cloud Data Fusion are hosted in a Google-owned tenant project. To access resources in the customer VPC, Cloud Data Fusion VMs use the IP address assigned by the Private Service Connect network interface, from the customer's subnet. This subnet is added to the network attachment used by Cloud Data Fusion.

  • IP packets originating from the Private Service Connect interface are treated similarly to those from a VM in the same subnet. This configuration enables Cloud Data Fusion to directly access resources in the customer VPC or a peer VPC without the need for a proxy.

  • Internet resources become accessible when Cloud NAT is enabled in the customer VPC, while on-premises resources are reachable through an interconnect.

  • To manage ingress or egress from the Private Service Connect, you can implement firewall rules.

Key benefits

The following are the key benefits of using Cloud Data Fusion with Private Service Connect:

  • Better control of IP space. You control the IP addresses that Cloud Data Fusion uses to connect to your network. You choose the subnets from which the IP addresses are allocated to Cloud Data Fusion. All the traffic from Cloud Data Fusion has a source IP address from your configured subnet.

    Private Service Connect eliminates the need for reserved IP addresses from a customer VPC. VPC peering requires a /22 CIDR block (1024 IP addresses) per Cloud Data Fusion instance.

  • Improved security and isolation. By configuring a network attachment, you control which services can access your network.

  • Simplified Cloud Data Fusion instance setup. Create a network attachment per customer VPC only once. No need to use proxy VMs to connect to resources on the internet, peer VPCs, or on-premises.

Key concepts

Network attachment

Network attachment is a regional resource used to authorize Cloud Data Fusion to use and establish network connections privately, for accessing resources in your VPC. For more information, see About network attachments.

Shared VPC

The following is a use case for Private Service Connect interfaces with Shared VPC:

  • The network or the infra team owns the subnets in a host project. They let the application teams use these subnets from their service project.

  • The application teams own the network attachments in a service project. The network attachment defines which Cloud Data Fusion tenant projects can connect to the subnets linked to the network attachment.

You can create a network attachment in a service project. The subnets used in a network attachment can only be in the host project.

The following diagram illustrates this use case:

Use case for Private Service Connect interfaces with Shared VPC

Figure 2. Use case for Private Service Connect interfaces with Shared VPC

Description of Figure 2:

  • The network attachment is present in the service project. The network attachment uses a subnet that belongs to a Shared VPC in the host project.

  • The Cloud Data Fusion instance is present in the service project, and it uses the network attachment from the service project for establishing private connectivity.

  • The Cloud Data Fusion instance is assigned IP addresses from the subnet in the Shared VPC.

Before you begin

  • Private Service Connect is available only in Cloud Data Fusion version 6.10.0 and later.
  • You can enable Private Service Connect only when you create a new Cloud Data Fusion instance. You cannot migrate the existing instances to use Private Service Connect.

Pricing

Data ingress and egress through Private Service Connect is charged. For more information, see the Private Service Connect pricing.

Required roles and permissions

To get the permissions that you need to create a Cloud Data Fusion instance and network attachment, ask your administrator to grant you the following IAM roles on your project:

To get the permissions that Cloud Data Fusion needs to validate the network configuration, ask your administrator to grant the following IAM roles to the Cloud Data Fusion Google-managed service account (of the format service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com):

  • For the VPC associated with the network attachment: Compute Network Viewer (roles/compute.networkViewer)

  • For Cloud Data Fusion to add its tenant project to the producer accept list of the network attachment:

    • compute.networkAttachments.get
    • compute.networkAttachments.update
    • compute.networkAttachments.list

    The most restrictive role with these permissions is the Compute Network Admin (roles/compute.networkAdmin) role. These permissions are part of the Cloud Data Fusion API Service Agent (roles/datafusion.serviceAgent) role, which is automatically granted to the Cloud Data Fusion Google-managed service account. Therefore, no action is required unless the service agent role grant has been explicitly removed.

For more information about granting roles, see Manage access.

You might also be able to get the required permissions through custom roles or other predefined roles.

For more information about access control options in Cloud Data Fusion, see Access control with IAM.

Create a VPC or a Shared VPC network

Ensure that you have created a VPC network or a Shared VPC network.

Configure Private Service Connect

To configure Private Service Connect in Cloud Data Fusion, you must first create a network attachment and then create a Cloud Data Fusion instance with Private Service Connect.

Create a network attachment

The network attachment provides a set of subnetworks. To create a network attachment, follow these steps:

Console

  1. In the Google Cloud console, go to the Network attachments page:

    Go to Network attachments

  2. Click Create network attachment.

  3. In the Name field, enter a name for your network attachment.

  4. From the Network drop-down, select a VPC or a Shared VPC network.

  5. From the Region drop-down, select a Google Cloud region. This region must be the same as the Cloud Data Fusion instance.

  6. From the Subnetwork drop-down, select a subnetwork range.

  7. In Connection preference, select Accept connections for selected projects.

    Cloud Data Fusion automatically adds the Cloud Data Fusion tenant project to the Accepted projects list when you create the Cloud Data Fusion instance.

  8. Don't add Accepted projects or Rejected projects.

  9. Click Create network attachment.

    Create a network attachment

gcloud

  1. Create one or more subnetworks. For example:

    gcloud compute networks subnets create subnet-1 --network=network-0 --range=10.10.1.0/24 --region=REGION
    

    The network attachment uses these subnetworks in the subsequent steps.

  2. Create a network attachment resource in the same region as the Cloud Data Fusion instance, with the connection-preference property set to ACCEPT_MANUAL:

    gcloud compute network-attachments create NAME
    --region=REGION
    --connection-preference=ACCEPT_MANUAL
    --subnets=SUBNET
    

    Replace the following:

    • NAME: the name for your network attachment
    • REGION: the name of the Google Cloud region. This region must be the same as the Cloud Data Fusion instance
    • SUBNET: the name of the subnet

    The output of this command is a network attachment URL of the following format:

    projects/PROJECT/locations/REGION/network-attachments/NETWORK_ATTACHMENT_ID.

    Make a note of this URL as Cloud Data Fusion needs it for connectivity.

REST API

  1. Create a subnet.

  2. Create a network attachment:

    alias authtoken="gcloud auth print-access-token"
    NETWORK_ATTACHMENT_NAME=NETWORK_ATTACHMENT_NAME
    REGION=REGION
    SUBNET=SUBNET
    PROJECT_ID=PROJECT_ID
    
    read -r -d '' BODY << EOM
    {
      "name": "$NETWORK_ATTACHMENT_NAME",
      "description": "Network attachment for private Cloud Data Fusion",
      "connectionPreference": "ACCEPT_MANUAL",
      "subnetworks": [
        "projects/$PROJECT_ID/regions/$REGION/subnetworks/$SUBNET"
      ]
    }
    EOM
    
    curl -H "Authorization: Bearer $(authtoken)" \
    -H "Content-Type: application/json" \
    -X POST   -d "$BODY" "https://compute.googleapis.com/compute/v1/projects/$PROJECT_ID/regions/$REGION/networkAttachments"
    

    Replace the following:

    • NETWORK_ATTACHMENT_NAME: the name for your network attachment
    • REGION: the name of the Google Cloud region. This region must be the same as the Cloud Data Fusion instance
    • SUBNET: the name of the subnet
    • PROJECT_ID: the ID of your project

Create a Cloud Data Fusion instance

Cloud Data Fusion uses a /25 CIDR block (128 IPs) for resources in the tenant project. This is called the unreachable or reserved range. You can use the same IP addresses in VPCs, but Cloud Data Fusion VMs won't be able to connect with your resources using this range.

In most of the cases, this isn't an issue, as the unreachable CIDR block lies in a non-RFC 1918 range (240.0.0.0/8), by default. If you want to control the unreachable range, refer to Advanced configurations.

To create a Cloud Data Fusion instance with Private Service Connect enabled, follow these steps:

Console

  1. In the Google Cloud console, go to the Cloud Data Fusion Instances page, and click Create instance.

    Create an instance

  2. In the Instance name field, enter a name for your new instance.

  3. In the Description field, enter a description for your instance.

  4. From the Region drop-down, select the Google Cloud region in which you want to create the instance.

  5. From the Version drop-down, select 6.10 or later.

  6. Select an Edition. For more information about pricing for different editions, see the Cloud Data Fusion pricing overview.

  7. Expand Advance options and do the following:

    1. Select Enable private IP.

    2. Select Private Service Connect as the Connectivity type.

    3. In the Network attachment section, select the network attachment that you created in Create a network attachment.

  8. Click Create. It takes up to 30 minutes for the instance creation process to complete.

    Create a Cloud Data Fusion instance with Private Service Connect

REST API

Run the following command:

alias authtoken="gcloud auth print-access-token"

EDITION=EDITION
PROJECT_ID=PROJECT_ID
REGION=REGION
CDF_ID=INSTANCE_ID
NETWORK_ATTACHMENT_ID=NETWORK_ATTACHMENT_ID

read -r -d '' BODY << EOM
{
  "description": "PSC enabled instance",
  "version": "6.10",
  "type": "$EDITION",
  "privateInstance": "true",
  "networkConfig": {
    "connectionType": "PRIVATE_SERVICE_CONNECT_INTERFACES",
    "privateServiceConnectConfig": {
      "networkAttachment": "$NETWORK_ATTACHMENT_ID"
    }
  }
}
EOM

curl -H "Authorization: Bearer $(authtoken)" \
-H "Content-Type: application/json" \
-X POST   -d "$BODY" "https://datafusion.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/instances/?instanceId=$CDF_ID"

Replace the following:

  • EDITION: the Cloud Data Fusion edition— BASIC, DEVELOPER, or ENTERPRISE.
  • PROJECT_ID: the ID of your project.
  • REGION: the name of the Google Cloud region. This region must be the same as the Cloud Data Fusion instance.
  • INSTANCE_ID: the ID of your instance.
  • NETWORK_ATTACHMENT_ID: the ID of your network attachment.

Advanced configurations

To enable sharing of subnets, you can provide the same network attachment to multiple Cloud Data Fusion instances. In contrast, if you want to dedicate a subnet for a particular Cloud Data Fusion instance, then you must provide a specific network attachment, to be used by the Cloud Data Fusion instance.

Recommended: To apply a uniform firewall policy to all of the Cloud Data Fusion instances, use the same network attachment.

If you want to control the /25 CIDR block that is not reachable by Cloud Data Fusion, specify the unreachableCidrBlock property when you create the instance. For example:

alias authtoken="gcloud auth print-access-token"

EDITION=EDITION
PROJECT_ID=PROJECT_ID
REGION=REGION
CDF_ID=INSTANCE_ID
NETWORK_ATTACHMENT_ID=NETWORK_ATTACHMENT_ID
UNREACHABLE_RANGE=UNREACHABLE_RANGE

read -r -d '' BODY << EOM
{
  "description": "PSC enabled instance",
  "version": "6.10",
  "type": "$EDITION",
  "privateInstance": "true",
  "networkConfig": {
    "connectionType": "PRIVATE_SERVICE_CONNECT_INTERFACES",
    "unreachableCidrBlock": "$UNREACHABLE_RANGE"
    "privateServiceConnectConfig": {
      "networkAttachment": "projects/$PROJECT_ID/regions/$REGION/networkAttachments/$NETWORK_ATTACHMENT_ID"
    }
  }
}
EOM

curl -H "Authorization: Bearer $(authtoken)" \
-H "Content-Type: application/json" \
-X POST -d "$BODY" "https://datafusion.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/instances/?instanceId=$CDF_ID"

Replace the following:

  • EDITION: the Cloud Data Fusion edition— BASIC, DEVELOPER, or ENTERPRISE.
  • PROJECT_ID: the ID of your project.
  • REGION: the name of the Google Cloud region. This region must be the same as the Cloud Data Fusion instance.
  • INSTANCE_ID: the ID of your instance.
  • NETWORK_ATTACHMENT_ID: the ID of your network attachment.
  • UNREACHABLE_RANGE: the unreachable range—for example, 10.0.0.0/25.

Security

Cloud Data Fusion to consumer security

Private Service Connect interfaces support egress firewall rules to control what Cloud Data Fusion can access within your VPC. For more information, see Limit producer-to-consumer ingress.

Consumer to Cloud Data Fusion security

Cloud Data Fusion VMs with Private Service Connect interface block any traffic that originates from your VPC and isn't a response packet.