Private Service Connect with Dataproc Metastore

This page explains what Private Service Connect is and how to use it for networking as an alternative to VPC peering.

Dataproc Metastore service without VPC peering

Dataproc Metastore protects its metadata access by exposing only private IP endpoints. It also restricts connectivity to VMs in the provided customer's VPC network through VPC peering.

Dataproc Metastore requires the following per region for each VPC network:

Setting up VPC peering and IP address reservation poses a challenge for crowded VPC networks. Similarly, a VPC network may not have enough peering quota to accommodate additional peering requests. Both of these limitations prevent new Dataproc Metastore service creations.

You can create a Dataproc Metastore service without VPC peering and address block reservations by using Private Service Connect to expose the Dataproc Metastore endpoint. Private Service Connect allows a private connection to Dataproc Metastore metadata across VPC networks.

With Private Service Connect, Dataproc Metastore requires a single address reservation in the subnetwork and a forwarding rule targeting the service attachment that exposes the Dataproc Metastore endpoint. The address reservation and forwarding rule are created as a part of the Dataproc Metastore service create call.

Create a Dataproc Metastore service with Private Service Connect

The following instructions demonstrate how to configure Private Service Connect during service creation.

Console

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Google Cloud console

  2. At the top of the Dataproc Metastore page, click the Create button. The Create service page opens.

  3. Configure your service as desired.

  4. Under Network configuration, click Make services accessible in multiple VPC subnetworks.

  5. Select the Subnetworks. You can specify up to 5 subnetworks.

  6. Click Done.

  7. Click Submit.

Verify the service's network configuration:

  1. In the Google Cloud console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Google Cloud console

  2. On the Dataproc Metastore page, click the service name of the service you'd like to view. The Service detail page for that service opens.

  3. Under the Configuration tab, verify that the details show multiple VPC subnetwork URIs.

gcloud

  1. Run the following gcloud metastore services create command to create a service with Private Service Connect:

    gcloud metastore services create SERVICE \
       --location=LOCATION \
       --consumer-subnetworks="projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1, projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET2"
    

    or

    gcloud metastore services create SERVICE \
       --location=LOCATION \
       --network-config-from-file=NETWORK_CONFIG_FROM_FILE
    
  2. Verify that the creation was successful.

REST

Follow the API instructions to create a service by using the API Explorer.

In the create request parameters, use the field Network Config to configure Private Service Connect:

     "network_config": {
       "consumers": [
           {"subnetwork": "projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1"},
           {"subnetwork": "projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET2"}
       ]
     }
   

You can specify 1 to 5 subnetworks.

Dataproc Metastore reserves addresses and creates forwarding rules in each of the specified subnetworks. Each subnetwork has a Thrift endpoint URI that you can use to access the Dataproc Metastore metadata endpoint from.

Attach a Dataproc cluster

You can attach a Dataproc cluster that uses the Dataproc Metastore service with Private Service Connect as its Hive metastore using the service's endpoint URI and warehouse directory.

For more information on how to attach a Dataproc cluster, see Attach a Dataproc cluster using the ENDPOINT_URI and WAREHOUSE_DIR.

Private Service Connect caveats for Dataproc Metastore

  • Dataproc Metastore service endpoints that use Private Service Connect only support access from subnetworks in the same region as the service.
  • Reverse connectivity is not possible. This means Kerberos configuration with Private Service Connect setup is not supported.
  • Creating a Dataproc Metastore service with gRPC endpoint protocol doesn't support network configuration.
  • You can't dynamically add or remove subnets from a Dataproc Metastore service. You must recreate a service if you'd like to add or remove subnets.
  • You can't update a Dataproc Metastore service from Private Service Connect setup to peering setup or vice versa.
  • Auxiliary versions are not supported for Private Service Connect configuration.

What's next